Approximating Named Arguments in Java

by: Ethan McCue

Named arguments are a language feature where you can provide arguments to a function call by name instead of positionally. This is usually paired with some mechanism to have default values that a caller does not need to specify

Take the definition of k_means from scikit-learn for example.

def k_means(
    X,
    n_clusters,
    *,
    sample_weight=None,
    init="k-means++",
    n_init="auto",
    max_iter=300,
    verbose=False,
    tol=1e-4,
    random_state=None,
    copy_x=True,
    algorithm="lloyd",
    return_n_iter=False,
):
    # ...

It takes two arguments positionally and then has a very large number of optional arguments which have default values.

If a caller wants to only change the max_iter parameter they do not need to specify the others

k_means(X, n_clusters, max_iter=10000)

Java does not (yet) have this or an equivalent as a language feature. Not only can you not specify parameters by name, you cannot get default values for said parameters easily

// This is unideal for multiple reasons
k_means(
    X,
    n_clusters,
    null,
    "k-means++",
    "auto",
    300,
    false,
    1e-4,
    null,
    true,
    "lloyd",
    false
);

There are many strategies for coping with this. The most notable is using the builder pattern to make an "input object".

var input = KMeansInput.builder(X, n_clusters).maxIter(10000).build(); 
var o = k_means(input);

This requires you to make both a class that holds all the input values in addition to an intermediate mutable class which holds all the same values.

public final class KMeansInput {
    private final Object X;
    private final int nClusters;
    private final Double sampleWeight;
    // ... Need to have all the data here
    
    private KMeansInput(Builder builder) {
        // copy from and potentially validate what is in the builder;
    }
    
    public static Builder builder(Object x, int nClusters) {
        return new Builder(x, nClusters);
    }
    
    public static final class Builder {
        private final Object X;
        private final int nClusters;
        private Double sampleWeight = null;
        // ... AND all the data here
        
        private Builder(Object x, int nClusters) {
            this.X = x;
            this.nClusters = nClusters;
        }
        
        // Then mutating methods
        public Builder sampleWeight(double sampleWeight) {
            this.sampleWeight = sampleWeight;
            return this;
        }
        
        // ...
        
        // and a build()
        public KMeansInput build() {
            return new KMeansInput(this);
        }
    }
}

And all of that is a non-trivial amount of boilerplate.

Of course you could just have a public class with all the fields exposed.

public class KMeansInput {
    public final Object X;
    public final int nClusters;
    public Double sampleWeight = null;
    // ...
    public int maxIter = 300;
    // ...

    public KMeansInput(Object x, int nClusters) {
        this.X = x;
        this.nClusters = nClusters;
    }
}

Since setting a field value puts the name of that field on the page, this also works as an approximation of named arguments.

var input = new KMeansInput(X, n_clusters);
input.maxIter = 10000; 
var o = k_means(input);

But there are two major downsides here. One is that you have a mutable object now and those are hard to reason about. You don't really know if k_means will mutate its "input object" or not.

var input = new KMeansInput(X, n_clusters);
input.maxIter = 10000; 
// All is well for the first call
var o1 = k_means(input);
// But this second call is dodgey
var o2 = k_means(input);

The other is the classic "there is no way to evolve a call to a field in a binary compatible way" issue.

If you wanted to make sure maxIter is never negative but still want a mutable aggregate, here come the getters and setters.

var input = new KMeansInput(X, n_clusters);
input.setMaxIter(10000); 
var o = k_means(input);

But this reintroduces boilerplate.

public class KMeansInput {
    private final Object X;
    
    public Object getX() { return x; }
    
    private final int nClusters;
    
    public int getNClusters() { return nClusters; }
    
    private Double sampleWeight = null;
    
    public Double getSampleWeight() { return sampleWeight; }
    
    public void setSampleWeight(Double sampleWeight) {
        this.sampleWeight = sampleWeight;
    }
    
    // ...
    private int maxIter = 300;
    
    public int getMaxIter() {
        return maxIter;
    }
    
    public void setMaxIter(int maxIter) {
        this.maxIter = maxIter;
    }
    
    // ...

    public KMeansInput(Object x, int nClusters) {
        this.X = x;
        this.nClusters = nClusters;
    }
}

So now not only do you have about a lot of boilerplate, just like the builder approach, but you are stuck with a mutable object in the end.

The next best candidate is records. They work well for getting an immutable aggregate with default values.

public record KMeansInput(
        Object X,
        int nClusters,
        Double sampleWeight,
        String init,
        String nInit,
        int maxIter,
        boolean verbose,
        double tol,
        Object randomState,
        boolean copyX,
        String algorithm,
        boolean returnNIter
) {
    public KMeansInput(Object X, int nClusters) {
        this(
                X, 
                nClusters, 
                null, 
                "k-means++", 
                "auto", 
                300, 
                false, 
                1e-4, 
                null, 
                true, 
                "lloyd", 
                false
        );
    }
}

Downside is that the constructor(s) you make to delegate to the canonical constructor will be fiddly. There aren't any names in that this(...) invocation.

There is also the fact that records expose a pattern matching API that is not currently generally available to all classes. This means that, unlike the Builder -> input object flow, adding new components to an input object record is always a sort of breaking change.

You will eventually be able to use withers to take a fully constructed record, update one part, and recreate the record.

var input = new KMeansInput(X, n_clusters).with {
    maxIter = 10000;  
};

But withers are also still not in the language. You can emulate them by making your own mutable aggregate and using it as a temporary for "reconstruction."

import java.util.function.Consumer;

public record KMeansInput(
        Object X,
        int nClusters,
        Double sampleWeight,
        String init,
        String nInit,
        int maxIter,
        boolean verbose,
        double tol,
        Object randomState,
        boolean copyX,
        String algorithm,
        boolean returnNIter
) {
    public KMeansInput(Object X, int nClusters) {
        this(
                X,
                nClusters,
                null,
                "k-means++",
                "auto",
                300,
                false,
                1e-4,
                null,
                true,
                "lloyd",
                false
        );
    }

    public KMeansInput with(Consumer<MutableKMeansInput> consumer) {
        var mut = new MutableKMeansInput(this);
        consumer.accept(mut);
        return mut.freeze();
    }
}
public final class MutableKMeansInput {
    public Object X;
    public int nClusters;
    public Double sampleWeight;
    public String init;
    public String nInit;
    public int maxIter;
    public boolean verbose;
    public double tol;
    public Object randomState;
    public boolean copyX;
    public String algorithm;
    public boolean returnNIter;
    
    MutableKMeansInput(KMeansInput input) {
        this.X = input.X;
        this.nClusters = input.nClusters;
        // and so on
    }
    
    KMeansInput freeze() {
        return new KMeansInput(X, nClusters, sampleWeight, ...);
    }
    
}

Using this scheme would look something like this.

var input = new KMeansInput(X, nClusters).with(it -> {
    it.maxIter = 10000;
});

You might notice that its somewhat like "builder, but backwards." A builder starts out with a mutable aggregate and later builds a (hopefully) immutable one. A record with this style of wither emulation starts off with an immutable aggregate and uses a temporary mutable aggregate which gets turned into the immutable verson later

Unfortunately this didn't really solve our boilerplate problem. Records once withers are in the language come the closest though. As with all boilerplate a degree of code generation can help a little, so you can reach for recordbuilder or similar.

The last way I can think of is a Java classic. Instead of having a method that takes in arguments, turn the whole process into an object.

class KMeans {
    private final Object X;
    private final int nClusters;
    private Double sampleWeight;
    // ...
    
    public KMeans(Object X, int nClusters) {
        // ...
        // initialize all the defaults
    }
    
    // then setters for all the knobs
    public void setSampleWeight(Double sampleWeight) {
        this.sampleWeight = sampleWeight;
    }
    
    // ...
    
    // and finally the ability to run the algorithm
    public KMeansOutput run() {
        // ...
    }
}

The overall object lifecycle of "make it, tweak it, run it" comes up a lot, but often especially when reading code from Java's early days. I don't have a coherent explanation as to why, but I think at some point this style of code fell out of favor.

var kMeans = new KMeans(X, nClusters);
kMeans.setMaxIter(10000);
var o = kMeans.run();

So of all these options suck in their own ways.

Records with withers has the most potential in my eyes, but its important to remember that this whole process only solves for the situation where most of the parameters are optional and have defaults.

While I'd say rare-ish, its not impossible to end up wanting to make something where all 20 or so parameters should be defined and have no defaults. Named arguments as a general feature could handle that, but it's going to be quite awhile before that is top of anyone's priority list.


Edit: As has been pointed out, you can also pass a callback that operates on a mutable object without necessarily exposing the object to be directly constructed.

kMeans(X, nClusters, opts -> opts.maxIter = 1000);

Same general sorts of tradeoffs between direct fields vs. methods that set exist there.

I don't know why I forget about that one all the time.

Also with the setters you can still do the builder thing where you make then chainable.

new KMeansInput(X, nClusters)
    .setMaxIter(1000)
    .setTol(1e-8);

<- Index