mccue.devhttps://mccue.devclj-rsspublic static void main(String[] args) is deadhttps://mccue.dev/pages/9-16-25-psvmTue, 16 Sep 0025 05:18:08 +0000 public static void main(String[] args) is dead

public static void main(String[] args) is dead

by: Ethan McCue

As of September 16th, year of our lord 2025, this is no longer the first Java program you need to write.

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        System.out.print("What is your name? ");
        String name = scanner.nextLine();
        System.out.println("Hello, " + name);
    }
}

This is.

void main() {
    var name = IO.readln("What is your name? ");
    IO.println("Hello, " + name);
}

Good Fucking Riddance.

I'll be nuanced later, we've all earned some catharsis now.

Holy fucking shit did this suck1. There is a comments section below. Give your eulogy for that piece of shit sorcerous incantation there or wherever else.

new Scanner(System.in) and System.out.println too. Don't let your sense of dignity hold you back.

Just record yourself giving a guttural scream and post it. Sing a song, do a dance, cast aspersions.

Honestly just let it all out.

1: When I was a Freshman in High School I asked a Junior what it meant. He had no clue.

That Junior later went on to drop out of college and become a Minecraft Youtuber. I vividly remember him making videos where he and his girlfriend pretend to be toddlers in a Minecraft day-care. What a life.


<- Index

Approximating Named Arguments in Java

by: Ethan McCue

Named arguments are a language feature where you can provide arguments to a function call by name instead of positionally. This is usually paired with some mechanism to have default values that a caller does not need to specify

Take the definition of k_means from scikit-learn for example.

def k_means(
    X,
    n_clusters,
    *,
    sample_weight=None,
    init="k-means++",
    n_init="auto",
    max_iter=300,
    verbose=False,
    tol=1e-4,
    random_state=None,
    copy_x=True,
    algorithm="lloyd",
    return_n_iter=False,
):
    # ...

It takes two arguments positionally and then has a very large number of optional arguments which have default values.

If a caller wants to only change the max_iter parameter they do not need to specify the others

k_means(X, n_clusters, max_iter=10000)

Java does not (yet) have this or an equivalent as a language feature. Not only can you not specify parameters by name, you cannot get default values for said parameters easily

// This is unideal for multiple reasons
k_means(
    X,
    n_clusters,
    null,
    "k-means++",
    "auto",
    300,
    false,
    1e-4,
    null,
    true,
    "lloyd",
    false
);

There are many strategies for coping with this. The most notable is using the builder pattern to make an "input object".

var input = KMeansInput.builder(X, n_clusters).maxIter(10000).build(); 
var o = k_means(input);

This requires you to make both a class that holds all the input values in addition to an intermediate mutable class which holds all the same values.

public final class KMeansInput {
    private final Object X;
    private final int nClusters;
    private final Double sampleWeight;
    // ... Need to have all the data here
    
    private KMeansInput(Builder builder) {
        // copy from and potentially validate what is in the builder;
    }
    
    public static Builder builder(Object x, int nClusters) {
        return new Builder(x, nClusters);
    }
    
    public static final class Builder {
        private final Object X;
        private final int nClusters;
        private Double sampleWeight = null;
        // ... AND all the data here
        
        private Builder(Object x, int nClusters) {
            this.X = x;
            this.nClusters = nClusters;
        }
        
        // Then mutating methods
        public Builder sampleWeight(double sampleWeight) {
            this.sampleWeight = sampleWeight;
            return this;
        }
        
        // ...
        
        // and a build()
        public KMeansInput build() {
            return new KMeansInput(this);
        }
    }
}

And all of that is a non-trivial amount of boilerplate.

Of course you could just have a public class with all the fields exposed.

public class KMeansInput {
    public final Object X;
    public final int nClusters;
    public Double sampleWeight = null;
    // ...
    public int maxIter = 300;
    // ...

    public KMeansInput(Object x, int nClusters) {
        this.X = x;
        this.nClusters = nClusters;
    }
}

Since setting a field value puts the name of that field on the page, this also works as an approximation of named arguments.

var input = new KMeansInput(X, n_clusters);
input.maxIter = 10000; 
var o = k_means(input);

But there are two major downsides here. One is that you have a mutable object now and those are hard to reason about. You don't really know if k_means will mutate its "input object" or not.

var input = new KMeansInput(X, n_clusters);
input.maxIter = 10000; 
// All is well for the first call
var o1 = k_means(input);
// But this second call is dodgey
var o2 = k_means(input);

The other is the classic "there is no way to evolve a call to a field in a binary compatible way" issue.

If you wanted to make sure maxIter is never negative but still want a mutable aggregate, here come the getters and setters.

var input = new KMeansInput(X, n_clusters);
input.setMaxIter(10000); 
var o = k_means(input);

But this reintroduces boilerplate.

public class KMeansInput {
    private final Object X;
    
    public Object getX() { return x; }
    
    private final int nClusters;
    
    public int getNClusters() { return nClusters; }
    
    private Double sampleWeight = null;
    
    public Double getSampleWeight() { return sampleWeight; }
    
    public void setSampleWeight(Double sampleWeight) {
        this.sampleWeight = sampleWeight;
    }
    
    // ...
    private int maxIter = 300;
    
    public int getMaxIter() {
        return maxIter;
    }
    
    public void setMaxIter(int maxIter) {
        this.maxIter = maxIter;
    }
    
    // ...

    public KMeansInput(Object x, int nClusters) {
        this.X = x;
        this.nClusters = nClusters;
    }
}

So now not only do you have about a lot of boilerplate, just like the builder approach, but you are stuck with a mutable object in the end.

The next best candidate is records. They work well for getting an immutable aggregate with default values.

public record KMeansInput(
        Object X,
        int nClusters,
        Double sampleWeight,
        String init,
        String nInit,
        int maxIter,
        boolean verbose,
        double tol,
        Object randomState,
        boolean copyX,
        String algorithm,
        boolean returnNIter
) {
    public KMeansInput(Object X, int nClusters) {
        this(
                X, 
                nClusters, 
                null, 
                "k-means++", 
                "auto", 
                300, 
                false, 
                1e-4, 
                null, 
                true, 
                "lloyd", 
                false
        );
    }
}

Downside is that the constructor(s) you make to delegate to the canonical constructor will be fiddly. There aren't any names in that this(...) invocation.

There is also the fact that records expose a pattern matching API that is not currently generally available to all classes. This means that, unlike the Builder -> input object flow, adding new components to an input object record is always a sort of breaking change.

You will eventually be able to use withers to take a fully constructed record, update one part, and recreate the record.

var input = new KMeansInput(X, n_clusters).with {
    maxIter = 10000;  
};

But withers are also still not in the language. You can emulate them by making your own mutable aggregate and using it as a temporary for "reconstruction."

import java.util.function.Consumer;

public record KMeansInput(
        Object X,
        int nClusters,
        Double sampleWeight,
        String init,
        String nInit,
        int maxIter,
        boolean verbose,
        double tol,
        Object randomState,
        boolean copyX,
        String algorithm,
        boolean returnNIter
) {
    public KMeansInput(Object X, int nClusters) {
        this(
                X,
                nClusters,
                null,
                "k-means++",
                "auto",
                300,
                false,
                1e-4,
                null,
                true,
                "lloyd",
                false
        );
    }

    public KMeansInput with(Consumer<MutableKMeansInput> consumer) {
        var mut = new MutableKMeansInput(this);
        consumer.accept(mut);
        return mut.freeze();
    }
}
public final class MutableKMeansInput {
    public Object X;
    public int nClusters;
    public Double sampleWeight;
    public String init;
    public String nInit;
    public int maxIter;
    public boolean verbose;
    public double tol;
    public Object randomState;
    public boolean copyX;
    public String algorithm;
    public boolean returnNIter;
    
    MutableKMeansInput(KMeansInput input) {
        this.X = input.X;
        this.nClusters = input.nClusters;
        // and so on
    }
    
    KMeansInput freeze() {
        return new KMeansInput(X, nClusters, sampleWeight, ...);
    }
    
}

Using this scheme would look something like this.

var input = new KMeansInput(X, nClusters).with(it -> {
    it.maxIter = 10000;
});

You might notice that its somewhat like "builder, but backwards." A builder starts out with a mutable aggregate and later builds a (hopefully) immutable one. A record with this style of wither emulation starts off with an immutable aggregate and uses a temporary mutable aggregate which gets turned into the immutable verson later

Unfortunately this didn't really solve our boilerplate problem. Records once withers are in the language come the closest though. As with all boilerplate a degree of code generation can help a little, so you can reach for recordbuilder or similar.

The last way I can think of is a Java classic. Instead of having a method that takes in arguments, turn the whole process into an object.

class KMeans {
    private final Object X;
    private final int nClusters;
    private Double sampleWeight;
    // ...
    
    public KMeans(Object X, int nClusters) {
        // ...
        // initialize all the defaults
    }
    
    // then setters for all the knobs
    public void setSampleWeight(Double sampleWeight) {
        this.sampleWeight = sampleWeight;
    }
    
    // ...
    
    // and finally the ability to run the algorithm
    public KMeansOutput run() {
        // ...
    }
}

The overall object lifecycle of "make it, tweak it, run it" comes up a lot, but often especially when reading code from Java's early days. I don't have a coherent explanation as to why, but I think at some point this style of code fell out of favor.

var kMeans = new KMeans(X, nClusters);
kMeans.setMaxIter(10000);
var o = kMeans.run();

So of all these options suck in their own ways.

Records with withers has the most potential in my eyes, but its important to remember that this whole process only solves for the situation where most of the parameters are optional and have defaults.

While I'd say rare-ish, its not impossible to end up wanting to make something where all 20 or so parameters should be defined and have no defaults. Named arguments as a general feature could handle that, but it's going to be quite awhile before that is top of anyone's priority list.


Edit: As has been pointed out, you can also pass a callback that operates on a mutable object without necessarily exposing the object to be directly constructed.

kMeans(X, nClusters, opts -> opts.maxIter = 1000);

Same general sorts of tradeoffs between direct fields vs. methods that set exist there.

I don't know why I forget about that one all the time.

Also with the setters you can still do the builder thing where you make then chainable.

new KMeansInput(X, nClusters)
    .setMaxIter(1000)
    .setTol(1e-8);

<- Index

Inheritance vs. Composition

by: Ethan McCue

This list is the result of a question posed by Mika Moilanen. It's not comprehensive, but I think it's illustrative of a way to break down these choices that's a tad more healthy than "always choose X" or "never do Y." Specifically because you can take these properties and evaluate them against a given set of conditions.

"List all the differences between using an abstract class and using a delegate object when sharing common functionality."

Both let you reuse a method definition

class Adder {
    int add(int x, int y) {
        return x + y;
    }
}

class Composition {
    private Adder a = new Adder();
    
    int add(int x, int y) {
        return a.add(x, y);
    }
}
abstract class Adder {
    int add(int x, int y) {
        return x + y;
    }
}

class Inheritance extends Adder {
    // Implicitly inherits add
}

Composition requires you re-list every method you want to borrow

class MathDoer {
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return x - y;
    }

    int mul(int x, int y) {
        return x * y;
    }

    int div(int x, int y) {
        return x / y;
    }
}

class Composition {
    private MathDoer m = new MathDoer();

    int add(int x, int y) {
        return m.add(x, y);
    }

    int sub(int x, int y) {
        return m.sub(x, y);
    }

    int mul(int x, int y) {
        return m.mul(x, y);
    }

    int div(int x, int y) {
        return m.div(x, y);
    }
}
abstract class MathDoer {
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return x - y;
    }

    int mul(int x, int y) {
        return x * y;
    }

    int div(int x, int y) {
        return x / y;
    }
}

class Inheritance extends Adder {
    // Implicitly inherits add, sub, mul, and div
}

Inheritance can lead to observing otherwise hidden method relationships

abstract class MathDoer {
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return add(x, -1 * y);
    }
}

class Inheritance extends Adder {
    // Will print for both add and sub
    // but brittle if sub is ever redefined
    // to be "return x - y;"
    int add(int x, int y) {
        IO.println("Add called. x=" + x + ", y=" + y);
        return super.add(x, y);
    }
}
class MathDoer {
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return add(x, -1 * y);
    }
}

class Composition {
    private MathDoer m = new MathDoer();

    // Resilient to whatever internal relationships
    // add and sub have within MathDoer
    int add(int x, int y) {
        IO.println("Add called. x=" + x + ", y=" + y);
        return m.add(x, y);
    }

    int sub(int x, int y) {
        IO.println("Sub called. x=" + x + ", y=" + y);
        return m.sub(x, y);
    }
}

Inheritance implicitly gives polymorphic dispatch for all exposed methods

abstract class MathDoer {
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return x - y;
    }
}

class Inheritance extends MathDoer {
    int add(int x, int y) {
        IO.println("Add called. x=" + x + ", y=" + y);
        return super.add(x, y);
    }
}

void main() {
    MathDoer m = new Inheritance();
    IO.println(m.add(1, 2));
}
// To get dynamic dispatch you need an intermediate interface
interface IMathDoer {
    int add(int x, int y);
    int sub(int x, int y);
}

// This in turn means methods must be public.
//
// If you want "package-private" dynamic dispatch you
// are out of luck.
class MathDoer implements IMathDoer {
    public int add(int x, int y) {
        return x + y;
    }

    public int sub(int x, int y) {
        return add(x, -1 * y);
    }
}

class Composition implements IMathDoer {
    private MathDoer m = new MathDoer();
    
    public int add(int x, int y) {
        return m.add(x, y);
    }

    public int sub(int x, int y) {
        return m.sub(x, y);
    }
}

void main() {
    IMathDoer m = new Composition();
    IO.println(m.add(1, 2));
}

You can compose behavior from multiple objects. Inheritance is linear

abstract class Adder {
    int add(int x, int y) {
        return x + y;
    }
}

abstract class Subber {
    int sub(int x, int y) {
        return x - y;
    }
}

class Inheritance extends Adder { // Cannot also extend Subber
    
}
class Adder {
    int add(int x, int y) {
        return x + y;
    }
}

class Subber {
    int sub(int x, int y) {
        return x - y;
    }
}

class Composition {
    private Adder a = new Adder();
    private Subber s = new Subber();

    // Resilient to whatever internal relationships
    // add and sub have within MathDoer
    int add(int x, int y) {
        return a.add(x, y);
    }

    int sub(int x, int y) {
        s.sub(x, y);
    }
}

Inheritance lets you define protected members which are public only to classes in the same package and extenders

package a;

abstract class MathDoer {
    protected abstract int addInternal(int x, int y);
    
    public final int add(int x, int y) {
        return addInternal(x, y);
    }

    public final int sub(int x, int y) {
        return addInternal(x, mul(y, -1));
    }
}
package b;

// If extending the class, can see addInternal 
// despite being in a different package
class Inheritance extends MathDoer {
    protected int addInternal(int x, int y) {
        return x + y;
    }
}
package a;

// No way to define a method or field that will
// be visible across package boundaries only to things
// that are "composing".
final class MathDoer {
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return add(x, -1 * y);
    }
}

Inheritance requires you directly expose constructors

abstract class MathDoer {
    // Subclasses need to call a super-class constructor
    MathDoer() {}
    
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return add(x, -1 * y);
    }
}

class Inheritance extends Adder {
    Inhertiance() {
        super();
    }
}
class MathDoer {
    private MathDoer() {}
    
    public static MathDoer getIt() {
        return new MathDoer();
    }
    
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return add(x, -1 * y);
    }
}

class Composition {
    // Meanwhile composition can allow you to keep constructors hidden
    // and only expose static factories or builders.
    private MathDoer m = MathDoer.getIt();
    
    int add(int x, int y) {
        return m.add(x, y);
    }

    int sub(int x, int y) {
        return m.sub(x, y);
    }
}

Inheritance leaves you exposed to new methods with incompatible signatures

abstract class MathDoer {
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return add(x, -1 * y);
    }
    
    // int mul(int x, int y) {
    //     return x * y;
    // }
}

class Inheritance extends Adder {
    double mul(int x, int y) {
        // This is fine initially, but poses a problem 
        // if an incompatible method is added to the superclass later
        return ((double) x * y);
    }
}
class MathDoer {
    int add(int x, int y) {
        return x + y;
    }

    int sub(int x, int y) {
        return add(x, -1 * y);
    }

    int mul(int x, int y) {
        return x * y;
    }
}

class Composition {
    private MathDoer m = new MathDoer();

    // Resilient to whatever internal relationships
    // add and sub have within MathDoer
    int add(int x, int y) {
        return m.add(x, y);
    }

    int sub(int x, int y) {
        return m.sub(x, y);
    }

    // Any methods not taken via composition
    // do not pose a problem.
    //
    // A caveat is that this is an issue
    // with interfaces as well, so its more a property
    // of polymorphic dispatch
    double mul(int x, int y) {
        return ((double) x * y);
    }
}

Once I am able to cohere my thoughts without foaming at the mouth and twitching, expect a rant on "mechanics-ism."


<- Index

The Economics of AI (Ed Zitron is Wrong)

by: Ethan McCue

Ed Zitron, or as my friends call him while doing the SpongeBob thing "EdWaRd," Seems to think that AI is in some manner a bubble.

This is patently ridiculous. To understand why you need to understand a little basic kindergarten level economics.

The Lemonade Stand

Imagine a lemonade stand run by three plucky 12-year-olds. Call them Ed, Edd, and Eddy.

Each day of operation for this lemonade stand will both cost money to run and bring in money from sales. These costs are expenses and the money brought in is revenue.

Subtract expenses from revenue, and you get the total profit (or loss) for that day of operations.

Say they sell $10 worth of lemonade and the cost of sugar, lemon, and water totals up to $6. For that day they will have made $4 in profit.

Startup Costs

Businesses have all sorts of expenses.

We can simplify these broadly into one-time expenses (like the lumber to make the booth) and recurring expenses (like the materials to make the lemonade.)

One-time expenses needed for the business to begin operations are called "startup costs."

Since the business is not yet running - at least not at full capacity - the funds to cover these startup costs needs to come from somewhere else. For a lemonade stand this is likely a parent of one of the children. This, depending on the preferred child-rearing strategy, can make that parent an "investor."

So if it costs $20 to get all the supplies for the first day of operations, how long until the lemonade stand can repay their startup costs?

Unit Economics

It is generally understood that the more customers a business has the more money it makes.

To know exactly how much money to expect you need to understand the "unit economics" of the business. This is just the amount made per unit of product sold.

So if you sell 1 cup of lemonade for $1.00, and it took $0.50 to make, you will profit $0.50 per cup of lemonade sold.

This means that once you sell 40 cups of lemonade you will have made back the initial $20 investment.

Economies of Scale

Unfortunately sugar and lemons are not sold in clean "amount needed for a cup of lemonade" increments.

If you buy a small bag of sugar it might cost $5 but contain enough sugar for 20 cups of lemonade. If, however, you buy a large bag of sugar it might cost $100 but contain enough sugar for 1000 cups of lemonade.

This is what people mean by "economies of scale." If you can expect to sell more than 1000 cups of lemonade you can make your per-unit cost go down by buying the larger bag of sugar. If you aren't yet selling that much lemonade you might need to make do with the smaller bag.

Subsidies

It is always possible for a business to run at a loss. There are two broad reasons for this.

First is that it might not yet be profitable. It takes time for customers to discover and use your lemonade stand. For a period of time you might be stuck buying small bags of sugar. If you actually priced your product so that you make a profit-per-unit you might never get enough customers to "activate economies of scale."

Second is that it might never be profitable.

In both situations a business needs to be subsidized. This means that cash needs to be continuously injected to keep operations afloat.

Investors generally do this because they believe that, with enough time in the proverbial oven, the business can become profitable. One way that can happen is that the aforementioned "economies of scale" can drive down per-unit costs.

The other is "market capture." If you pour enough money into a lemonade stand that it can open branches on literally every street corner you can prevent new lemonade stands from opening. Being the only option in town for lemonade means you can jack up the price to become profitable. This has some limits - you can't make lemonade $2000 and expect anyone to be able or willing to afford that - but monopolies and oligopolies are almost the ideal state for a business.

If a business will never become profitable investors still might subsidize it. Lemonade stands are often used to teach children life lessons, not to become profitable. For this reason a parental investor might contribute money that they never expect to make back. The thing they are purchasing is not future returns but instead the existence of the lemonade stand itself.

With a particularly affluent parent a modest lemonade stand can be kept in operation almost indefinitely, even if they sell the lemonade for free.

Anthropic

Anthropic, a major AI company, loses money on every active user. This is because they charge people a monthly rate, and it costs multiple times that monthly rate to provide them the product they paid for.

The numbers aren't fully known, but it seems safe to say that for everyone who pays for and uses their platform they probably lose at least ~3x what that person paid them. The more someone uses their AI the faster they lose money.

There aren't any economies of scale left for them to activate. They are already plenty big and the GPUs they bought are expensive. But the more expensive part is actually running them. The costs of doing that aren't going to go down any time soon and actually increase the more usage they see.

They also can't make up the difference by capturing the market. Their users are costing them multiple thousands of dollars and only paying them at most a few hundred. Not enough people will drop a mortgage payment on AI to make up the difference.

Anthropic is just one AI company, but the broad math seems to be the same for all of them.

Why this still works

But here's the part that critics don't understand.

People aren't investing in AI companies to make a profit. It's clear these companies will never be profitable and, because the market famously is never irrational, investors must be taking this into account.

Instead, these companies are kept healthy and thriving by "parental investors." Just like how a lemonade stand keeps a kid occupied on the weekend and out of trouble, generative AI companies keep a large number of business and tech brothers off the streets. You'd much rather have then doing ZYNs and Addies in a safe office setting where they can chill with their homies and learn valuable life skills.

These investors have, from my perspective, almost infinite money. It only makes sense that they will be able to keep investing that infinite money forever, regardless of how much money the AI companies lose.

The future is bright and this economy - our economy - is invincible.


<- Index

Issues you will face binding to C from Java.

by: Ethan McCue

Specifically using the Foreign Function and Memory API.

I've written about some of these issues before, but now I have a project I can use as an example.

Stick around to the end for an open challenge to the audience.

Context

Yu-Gi-Oh is a children's card game from Japan. Each player has "life points" and the goal of the game is to use your monsters, spells, and traps to reduce your opponent's life points to zero.

This game is how I first learned to read.

My Father likes playing a Yu-Gi-Oh game from the early 2000s called "Power of Chaos: Yugi the Destiny," named after the character in the anime who solves a puzzle and gets possessed by the spirit of an ancient Egyptian Pharaoh.

Since he figured out it supports multiplayer and can let me play against him, he has also started playing "Power of Chaos: Joey the Passion." This is also from the early 2000s and is named after the character in the anime who is a Japanese high school student that for some reason speaks with a Brooklyn accent.

Every now and then a character will die because they lost this children's card game. This got censored as them being "sent to the shadow realm," a place of eternal torture where their soul will never know peace. Much more kid friendly.

In my opinion the "Power of Chaos" series of games have by far the best visual, audio, and interaction design of any Yu-Gi-Oh game. This includes unofficial fan projects.

So something I threw on my large list of back-burner projects is to make my own Yu-Gi-Oh game that closely emulates the design of those early games.

Unfortunately the actual card game is crazy complicated. Every card is basically its own tiny program and there are decades of special case rulings to implement. It is well beyond the scope of what I am capable of as an individual.

Fortunately there is both a crowdsourced repository of lua scripts for all the cards and a community maintained engine for simulating the rules of the game. This engine has a relatively minimal C API and is, I think, a perfect example of the kind of thing that should have bindings written for it. The depth of expertise that went into it is unrepeatable.

Problem 1: C libraries aren't always pure C

Despite having a C API some of the headers of this particular library made use of C++ features. This is often ignorable in the C/C++ world since the major C compilers also support C++. It is not ignorable when you use jextract to generate bindings. jextract only supports C.

You could interpret this as a one-off, but my suspicion is that it's a problem that will naturally recur for libraries written in C++ that expose a C API. Especially if they are never tested in this context.

The solutions are to either make issues/open PRs or to enhance jextract to be able to support a limited subset of C++. Despite the words "limited subset" the latter is likely a nightmare pit of a task, so you should be ready to do the first.

Problem 2: You need different Java code per-platform

jextract takes as input a C header file and spits out a folder of Java classes. These classes include information about available functions as well as the memory layouts of defined structs. Unfortunately C is a protocol where memory layouts can change based on both the target operating system and the underlying CPU architecture.

This means you need different Java code for macos-aarch64 than you do for windows-x86. Pop-quiz: how do you do this in Maven? In Gradle?

You also need access to the target platform to generate this code. jextract doesn't have the zig cc magic that lets it compile for any platform from any platform. So you are left with having to use services like Github Actions.

In the absolute best case scenario you can just swap out a source set when compiling. OCG_CardData might have a platform specific layout, but the generated methods and struct members will be the same.

If you want to have one library support all platforms you'll need to contrive a system for selecting methods to call at runtime. This can lead to some pointless duplication.

Problem 3: C libraries are often special snowflakes

Question: what is the Maven Central for C?

Answer: ha. hahaha. ha. ha. ha. ha.

While there are things like Conan and vcpkg out in the world it's far from a given that any particular library you'd want is on them.

Its likely you'll end up either adding a GitHub repo as a submodule or writing a script to download the repo or some random .zip/.tar.gz file.

From there every C library has its own build instructions and flags you may or may not need to pass.

Pop-quiz: how do you do this in Maven? In Gradle?

Problem 4: You need to figure out packaging

From here we'll assume that the C library you are binding is going into a library. Either to be shared with the wider world or to be one module in a larger project tree.

Firstly, you probably need to make one artifact per target platform. In Maven repositories the way people share such artifacts is with a classifier scheme. Each artifact should be published with a -macos-aarch64 or similar classifier and modules depending on these libraries need to select the right one.

Then for the actual .dll/.so/.dylib that is the compiled artifact for the C library you have a few options.

  1. Expect the user to just have it on their machine already or install it separately.
  2. Expect the user to provide it manually with -Dsystem.library.path.
  3. Package that file into a jar and extract it dynamically at runtime.
  4. Package your code as a .jmod and expect that to be usable.

Option 1 sucks for obvious reasons.

Option 2 is just option 1 but without it needing to be a global system-level thing. Tools like Maven are, best I can tell, built around the concept that a "Set of Dependencies" should become a --class-path. You can't really have C shared libraries as automatically resolved dependencies.

Option 3 is what most libraries do today. LWJGL has "natives" jars which contain the actual shared libraries. It then uses a shim to extract the shared libraries to the filesystem. This is annoying firstly because its finicky code. Extracting a file atomically to the filesystem is a whole thing. It also requires that you are deploying code in a context where you have a writable filesystem. This is not always the case.

But most importantly it annoys me because it is so clearly a forced move. If you could actually just put the shared library on the -Dsystem.library.path automatically in the same way other dependencies are automatically put on the --class-path I don't think people would be doing it. Well, that and if people weren't universally using uberjars as their deployment mechanism.

Option 4 I think has potential. It's what I've done for the Yu-Gi-Oh bindings. For those unaware, .jmod files have delineated locations for classes, legal metadata, configuration files, and shared libraries. This is how the JDK bundles the shared libraries needed for making things like Swing work.

One problem is that .jmods aren't usable at runtime. You need to link them together with jlink to produce a JDK then use that JDK. You can technically extract the contents of a .jmod and put the extracted classes folder on the --module-path and libs on the -Dsystem.library.path.

Pop-quiz: how do you do that in Maven? In Gradle?

Problem 5: Nobody uses the --module-path

In discussions about the --module-path the focus tends to be on one of three aspects.

  1. I heard a friend of a cousin say it broke a library nearly a decade ago in Java 9.
  2. It was needed to turn off deep reflection for JDK internals.
  3. Wow, tooling does not support this well at all.

The 3rd aspect is most relevant here because you can't really expect anyone using your library to have put it on the --module-path. This is unfortunate because modules are the only mechanism to group up and hide whole packages from external consumers.

jextract will generate a large number of public classes which contain calls to fundamentally unsafe APIs. Not only that, unless you go through and manually allowlist functions and symbols with --include-function et al. it will contain a lot of stray functions you may or may not be using in your actual binding code.

What you would ideally want is to just not export those autogenerated classes and only export the code you wrote that wraps them up. This would be needed for someone writing --enable-native-access=your.specific.module to mean "I have audited or otherwise trust the producer of this module to have interacted with the native world in a way that won't crash the JVM."

This is in contrast to --enable-native-access=ALL-UNNAMED and "yeah whatever, just let me use native stuff. Who cares where it comes from."

Even if you don't particularly care about that story, autogenerated bindings don't make for the world's best public API.

Challenges

So with that prelude, here is the challenge for the audience.

  1. Using your build tool of choice (Maven, Gradle, bld, Mill, etc.) replace the Justfile I have to build the project. I mean replace, not do a different or worse job. Produce as a final artifact a .jmod, procure the C library, run jextract, include the proper legal metadata, etc.

  2. Put it in a local maven repository and use it from a Maven project. Good luck.

  3. Bonus points, actually work on the unfinished binding code. Would be appreciated, but it's also my burden to bear and not super relevant to the rest of this post.


<- Index

Clojure as a First Language

by: Ethan McCue

This might come as somewhat of a shock to my regular audience, but I don't only write Java.

The reason I focus so much on specifically Java education is because it is often the first language people are taught. This matters for a lot of reasons, not all of which I have the page space to get in to, but crucially being taught first has a direct impact on a language's popularity.

I'm writing this because I think Clojure now has a real shot at becoming a first language.

Why

Of all the less-than-massively-popular-languages out there, Clojure is probably hurt the least by not having widespread adoption. This is in large part because it gets to piggyback on existing libraries and ecosystems.

Clojure is a hosted language. Clojure on the JVM can use any Java library, Clojure in the browser can use any JavaScript library, and so on. It is always good when Clojure-native libraries get written, but they've never been an absolute necessity.

So why pursue popularity?

1. Paper Clipping.

A lawyer has a duty to be a zealous advocate for their client. There's a non-trivial tribal monkey aspect to wanting your programming language to be popular and, at a certain point, pursuing that end has no fundamental justification.

But we make our own meaning in life so screw it.

2. It's Better.

The value proposition of Clojure is different to most other languages. I'm not going to s*** Paul Graham's d*** or wax poetic about macros, but it's hard to deny that Clojure codebases tend to be quite different to those written in Python, JavaScript, Java, R, etc.

There are reasons to think that Clojure could be a better fit for producing certain genres of software. Popularity would therefore lead to "better" software being produced, which is reasonable to want.

I'm being vague on purpose here. The difference between closed and open aggregates alone could be its own essay.

3. Money.

The more people who use Clojure the more Clojure jobs there are. The more Clojure jobs there are the more secure Clojure experts can be, etc.

TypedClojure is a one-man-show. TypeScript is funded by Microsoft paychecks. Popularity opens the door to all sorts of support.

How

So to become popular you need to be someone's first language. To be someone's first language you need to be what they learned in school. This means convincing teachers and curriculum makers.

For CS education I think this is a lost cause. There are very well-made CS 101 courses that use lisps and face cosmically stupid pushback for doing so. It is hard to imagine winning that fight.

But people going for Computer Science degrees aren't the only people who program. When my brother got his Masters in Marine Biology he was taught R. Analyzing data, making charts, etc. is needed for a wide variety of fields.

I think Clojure could steal significant market share here.

Noj is a collection of a bunch of data science libraries for Clojure. With what is in there you can:

This has the makings of an extremely compelling pitch.

The biggest missing piece, as I see it, is resources tailored to people learning Clojure as a first language.

Historically the vast majority of Clojure programmers have been transplants from other languages. We have books like Clojure for the Brave and True that serve this crowd, but few-to-none for people who are starting truly from scratch.

This has also affected the way people talk about Clojure. Keep in mind that you don't need to convince people of functional programming, homoiconicity, or anything else when it's the first thing they learn. The pitches you'd use to pull in a Ruby programmer should be kept in their lane.

So that's the gap. The data science people are presumably going to keep chugging away at the things they do. If you are interested in making Clojure a "First Language" give a shot at making something that fills that gap.


<- Index

A Modest Critique of Optional Handling

by: Ethan McCue

java.util.Optional is a class that breaks peoples brains a bit.

It offers a way to represent a potentially missing piece of information. If your method might not produce a result you can use an empty Optional to represent that.

Optional<LocalDate> birthday(String name) {
    return switch (name) {
        case "Mr. Rodgers" -> 
                Optional.of(LocalDate.of(1928, Month.MARCH, 20));
        case "Mr. T" -> 
                Optional.of(LocalDate.of(1952, Month.MAY, 21));
        default -> 
                Optional.empty();
    };
}

void main() {
    for (var name : List.of("Mr. T", "H.R. Puffinstuff")) {
        IO.println(name + ": " + birthday(name));
    }
}

This has a few pros over the other standard option of returning null.

For one, Java's type system (currently) does not track null values as part of a type. This means that a programmer needs to infer nullability from written documentation and context clues.

LocalDate birthday(String name) {
    return switch (name) {
        case "Mr. Rodgers" -> 
                LocalDate.of(1928, Month.MARCH, 20);
        case "Mr. T" -> 
                LocalDate.of(1952, Month.MAY, 21);
        default -> 
                null;
    };
}

void main() {
    for (var name : List.of("Mr. T", "H.R. Puffinstuff")) {
        // .getMonth() is not always valid to call
        IO.println(name + ": " + birthday(name).getMonth());
    }
}

This is mitigated somewhat when an API makes use of nullability annotations. Today that requires you to use special null-aware tooling but null-aware types are slated to come to Java proper at some point.

// LocalDate? in the future (probably)
@Nullable LocalDate birthday(String name) {
    return switch (name) {
        case "Mr. Rodgers" -> 
                LocalDate.of(1928, Month.MARCH, 20);
        case "Mr. T" -> 
                LocalDate.of(1952, Month.MAY, 21);
        default -> 
                null;
    };
}

void main() {
    for (var name : List.of("Mr. T", "H.R. Puffinstuff")) {
        // Tooling should catch this mistake
        IO.println(name + ": " + birthday(name).getMonth());
    }
}

But null values in Java always require explicit checks to handle.

if (v == null) {
    // ...
}

// or

v == null ? ... : ...;

// or Objects.requireNonNullElse(v, ...)
// but that's just a method

This makes them ill-suited to the task Optional was originally introduced for - being part of the customary chain of method calls that form stream operations. This is most important when an Optional appears in the middle of a stack of such operations, but even when the Optional handling is at the end it's at least aesthetic.

Stream.of(1, 2, 3)
    .filter(x -> x < 2)
    .findFirst() // Here is where we get an optional
    .orElse(0); // And we can preserve the method chain stack 

People are far more likely to remember to write the above when Java forces them to. That's the point of it.

Integer first = Stream.of(1, 2, 3)
    .filter(x -> x < 2)
    .findFirst(); 
// Not forced to handle null usually and it's aesthetically
// inconvenient. This leads to mistakes in the context of chained methods.
if (first == null) {
    first = 0;
}

But because it "solves null" (it doesn't) it sees a lot of use. A very common pattern has been for folks to take a null returning method and refactor it to use optional instead.

When this works its generally fine, what isn't fine is how callsites are often adapted.

User user = findUserById(id);
if (user != null) {
    // Logic
}

The code above is often turned into something like the following.

Optional<User> userOpt = findUserById(id);
if (userOpt.isPresent()) {
    User user = userOpt.get(); // or .orElseThrow() if civilized
    // Logic
}

Callsites refactored like this are strictly worse. Often an extra intermediate variable needs to be created, that intermediate has a stupid name, and static analysis tools will either not know what is happening or have special carve out exceptions so they know "when .isPresent(), .get() is okay."

Those carve out exceptions may or may not also apply to other ecosystem Optional types.

// It's hard to argue that java.util.Optional really deserves
// special treatment over vavr's Option, but because code like
// this exists it's going to get it.
Option<User> userOpt = findUserById(id);
if (userOpt.isDefined()) {
    User user = userOpt.get();
    // Logic
}

And the general sort of advice to deal with this is to use .map and .ifPresent - the methods that let you work with the value in an Optional without "unboxing" it.

findUserById(id).ifPresent(user -> {
    // Logic
});

// or keep chaining and deal with it later

findUserById(id).map(User::email);

When this works, it can be better. It just stops working when the code you want to run that involves the value might throw a checked exception - .ifPresent and friends cannot cope with that.

findUserById(id).ifPresent(user -> {
    // Logic
    if (...) {
        // Difficult to know what to do.
        // Do we wrap the exception?
        callRemoteAPI();  
    }
    // Logic
});

It also stops working when you have multiple Optionals, at least from a readability perspective.

findUserById(idA).ifPresent(userA -> {
    findUserById(idB).ifPresent(userB -> {
        findUserById(idC).ifPresent(userC -> {
            // Logic
        }); 
    });
});

And also when you want to mutate some local variables relevant to the rest of the function.

int existingUsers = 0;
findUserById(id).ifPresent(user -> {
    // Logic
    existingUsers++; // Lambda rules - does not work
    // Logic
});

And just in general, beyond the small and clean usages, I think using these methods makes code net worse.

What I suggest, and I think people don't consider specifically because they are viewing Optional as a replacement for null, is to use .orElse(null) and write the rest of the code in the method like normal.

User user = findUserById(id).orElse(null);
if (user != null) {
    // Logic
}

Static analysis tools can eat "local variable known to sometimes be null" for breakfast and there are no problems handling checked exceptions, multiple optionals, or mutating local state. It's a much more seamless refactor.

int existingUsers = 0;

User userA = findUserById(idA).orElse(null);
User userB = findUserById(idB).orElse(null);
User userC = findUserById(idC).orElse(null);
if (userA != null && userB != null && userC != null) {
    // Logic
    existingUsers++;
    if (...) {
        callRemoteAPI();  
    }
    // Logic
}

The fact that it might be null also becomes explicit in the source code at every usage site, which is a strict improvement over the equivalent code with null returning methods.

int existingUsers = 0;

// Back to inferring from context or documentation
User userA = findUserById(idA);
User userB = findUserById(idB);
User userC = findUserById(idC);
if (userA != null && userB != null && userC != null) {
    // Logic
    existingUsers++;
    if (...) {
        callRemoteAPI();  
    }
    // Logic
}

There are more bike sheds to have on Optional, don't get me wrong, I just want to strongly encourage you to reconsider blanket advice to "avoid .isPresent/.get, use .map/.ifPresent" and highlight that .orElse is useful for more than valid default values.


<- Index

Go's HTTP Server Patterns in Java 25

by: Ethan McCue

I'm not a professional Go programmer so feel free to correct me on anything.

The rest of this post is going to follow the example program from this piece of Go Documentation. I'm even going to straight rip a lot of the prose.

My intent is to highlight how you can do spiritually similar things in Java.

The code in this guide, god willing, will work unaltered in Java 25.

Prerequisites

  • Java.
  • Some Libraries. The repo at the end will have them all in a folder. Java dependency management is a whole adventure, for now just follow along the ride.

Getting Started

Make a new directory for this tutorial and cd into it:

$ mkdir javawiki
$ cd javawiki

Create a file named Wiki.java, open it in your favorite editor, and add the following lines:

void main() {
}

Data Structures

A wiki consists of a series of interconnected pages, each of which has a title and a body (the page content). Here, we define Page as a record with two components representing the title and body.

record Page(String title, byte[] body) {}

The Page record describes how page data will be stored in memory. But what about persistent storage? We can address that by creating a save method on Page:

record Page(String title, byte[] body) {
    void save() throws IOException {
        var filename = title + ".txt";
        Files.write(Path.of(filename), body);
    }
}

The save method throws an IOException to let the application handle it should anything go wrong while writing the file. If all goes well, Page.save() will return without throwing.

In addition to saving pages, we will want to load pages, too:

record Page(String title, byte[] body) {
    void save() throws IOException {
        var filename = title + ".txt";
        Files.write(Path.of(filename), body);
    }

    static Page load(String title) throws IOException {
        var filename = title + ".txt";
        var body = Files.readAllBytes(Path.of(filename));
        return new Page(title, body);
    }
}

The method load constructs the file name from the title parameter, reads the file's contents into a new variable body, and returns a reference to a Page object.

At this point we have a simple data structure and the ability to save to and load from a file. Let's update the main method to test what we've written:

void main() throws IOException {
    var p1 = new Page(
            "TestPage", 
            "This is a sample Page.".getBytes(StandardCharsets.UTF_8)
    );
    p1.save();
    var p2 = Page.load("TestPage");
    IO.println(new String(p2.body, StandardCharsets.UTF_8));
}

After executing this code, a file named TestPage.txt would be created, containing the contents of p1. The file would then be read into p2, and its body component printed to the screen.

You can run the program like this:

$ java Wiki.java

Click here to view the code we've written so far.

Introducing the jdk.httpserver module (an interlude)

Here's a full working example of a simple web server:

import module jdk.httpserver;

void handler(HttpExchange exchange) throws IOException {
    exchange.sendResponseHeaders(200, 0);
    try (var os = exchange.getResponseBody()) {
        os.write(
                "Hi there, I love %s!"
                        .formatted(exchange.getRequestURI().getPath().substring(1))
                        .getBytes(StandardCharsets.UTF_8)
        );
    }
}

void main() throws IOException {
    var server = HttpServer.create(
            new InetSocketAddress(8080),
            0
    );
    server.createContext("/", this::handler);
    server.start();
}

The main method begins with a call to HttpServer.create, which creates an http server that will listen on port 8080. (Don't worry about its second parameter, 0, for now.)

Then server.createContext tells the server to handle all requests to the web root ("/") with handler.

It then calls `server.start(). This method will block until the program is terminated.

If you run this program and access the URL:

http://localhost:8080/monkeys

the program would present a page containing:

Hi there, I love monkeys!

Introducing the dev.mccue.jdk.httpserver module (an interlude within an interlude)

The jdk.httpserver module does not see as widespread use as Go's equivalent. As such, I think there are some ways to improve the API that should be rolled into the standard library at some point.

I put the most important of these - a Body abstraction - into a library. I'm going to ignore how to procure libraries in Java for the moment, but suspend your disbelief in the meantime.

With this library the server example above becomes the following:

import module jdk.httpserver;
import module dev.mccue.jdk.httpserver;

void handler(HttpExchange exchange) throws IOException {
    HttpExchanges.sendResponse(
            exchange,
            200,
            Body.of("Hi there, I love %s!".formatted(
                    exchange.getRequestURI().getPath().substring(1)
            ))
    );
}

void main() throws IOException {
    var server = HttpServer.create(
            new InetSocketAddress(8080),
            0
    );
    server.createContext("/", this::handler);
    server.start();
}

Using jdk.httpserver to serve wiki pages

Let's create a handler, viewHandler that will allow users to view a wiki page. It will handle URLs prefixed with "/view/".

void viewHandler(HttpExchange exchange) throws IOException {
    var title = exchange.getRequestURI()
            .getPath()
            .substring("/view/".length());
    var p = Page.load(title);
    HttpExchanges.sendResponse(
            exchange,
            200,
            Body.of(
                    "<h1>%s</h1><div>%s</div>"
                            .formatted(
                                    p.title, 
                                    new String(p.body, StandardCharsets.UTF_8)
                            )
            )
    );
}

To use this handler, we rewrite our main function to initialize an HttpServer using the viewHandler to handle any requests under the path /view/.

void main() throws IOException {
    var server = HttpServer.create(
            new InetSocketAddress(8080),
            0
    );
    server.createContext("/view/", this::viewHandler);
    server.start();
}

Click here to view the code we've written so far.

Let's create some page data (as test.txt), compile our code, and try serving a wiki page.

Open test.txt file in your editor, and save the string "Hello world" (without quotes) in it.

java --module-path lib --add-modules ALL-MODULE-PATH Wiki.java

With this web server running, a visit to http://localhost:8080/view/test should show a page titled "test" containing the words "Hello world".

Editing Pages

A wiki is not a wiki without the ability to edit pages. Let's create two new handlers: one named editHandler to display an 'edit page' form, and the other named saveHandler to save the data entered via the form.

First, we add them to main():

void main() throws IOException {
    var server = HttpServer.create(
            new InetSocketAddress(8700),
            0
    );
    server.createContext("/view/", this::viewHandler);
    server.createContext("/edit/", this::editHandler);
    server.createContext("/save/", this::saveHandler);
    server.start();
}

The method editHandler loads the page (or, if it doesn't exist, create an empty Page record), and displays an HTML form.

void editHandler(HttpExchange exchange) throws IOException {
    var title = exchange.getRequestURI()
            .getPath()
            .substring("/edit/".length());
    Page p;
    try {
        p = Page.load(title);
    } catch (NoSuchFileException e) {
        p = new Page(title, new byte[]{});
    }
    HttpExchanges.sendResponse(
            exchange,
            200,
            Body.of(
                    """
                    <h1>Editing %s</h1>
                    <form action="/save/%s" method="POST">
                        <textarea name="body">%s</textarea><br>
                        <input type="submit" value="Save">
                    </form>
                    """.formatted(
                            p.title,
                            p.title,
                            new String(p.body, StandardCharsets.UTF_8)
                    )
            )
    );
}

This function will work fine, but all that hard-coded HTML is ugly. Of course, there is a better way.

(I mean honestly the bigger issue is XSS vulnerabilities, but whatever. Following along.)

The com.samskivert.jmustache module

com.samskivert.jmustache is a library available in the Java ecosystem. We can use it to keep the HTML in a separate file, allowing us to change the layout of our edit page without modifying the underlying Java code.

First, we must add com.samskivert.jmustache to the list of imports. Again, we are glancing over how you procure libraries for this tutorial.

import module jdk.httpserver;
import module dev.mccue.jdk.httpserver;
import module com.samskivert.jmustache;

Let's create a template file containing the HTML form. Open a new file named edit.html, and add the following lines:

<h1>Editing {{title}}</h1>

<form action="/save/{{title}}" method="POST">
    <div><textarea name="body" rows="20" cols="80">{{body}}</textarea></div>
    <div><input type="submit" value="Save"></div>
</form>

Modify editHandler to use the template, instead of the hard-coded HTML:

void editHandler(HttpExchange exchange) throws IOException {
    var title = exchange.getRequestURI()
            .getPath()
            .substring("/edit/".length());
    Page p;
    try {
        p = Page.load(title);
    } catch (NoSuchFileException e) {
        p = new Page(title, new byte[]{});
    }
    
    var template = Mustache.compiler()
            .compile(Files.readString(Path.of("edit.html")));
    HttpExchanges.sendResponse(
            exchange,
            200,
            Body.of(template.execute(Map.of(
                    "title", p.title,
                    "body", new String(p.body, StandardCharsets.UTF_8)
            )))
    );
}

Since we're working with templates now, let's create a template for our viewHandler called view.html:

<h1>{{title}}</h1>

<p>[<a href="/edit/{{title}}">edit</a>]</p>

<div>{{body}}</div>

Modify viewHandler accordingly:

void viewHandler(HttpExchange exchange) throws IOException {
    var title = exchange.getRequestURI()
            .getPath()
            .substring("/view/".length());
    var p = Page.load(title);
    var template = Mustache.compiler()
            .compile(Files.readString(Path.of("view.html")));
    HttpExchanges.sendResponse(
            exchange,
            200,
            Body.of(template.execute(Map.of(
                    "title", p.title,
                    "body", new String(p.body, StandardCharsets.UTF_8)
            )))
    );
}

Notice that we've used almost exactly the same templating code in both handlers. Let's remove this duplication by moving the templating code to its own function:

void renderTemplate(
        HttpExchange exchange,
        String tmpl,
        Page p
) throws IOException {
    var template = Mustache.compiler()
            .compile(Files.readString(Path.of(tmpl)));
    HttpExchanges.sendResponse(
            exchange,
            200,
            Body.of(template.execute(Map.of(
                    "title", p.title,
                    "body", new String(p.body, StandardCharsets.UTF_8)
            )))
    );
}

And modify the handlers to use that function:

void viewHandler(HttpExchange exchange) throws IOException {
    var title = exchange.getRequestURI()
            .getPath()
            .substring("/view/".length());
    var p = Page.load(title);
    renderTemplate(exchange, "view", p);
}

void editHandler(HttpExchange exchange) throws IOException {
    var title = exchange.getRequestURI()
            .getPath()
            .substring("/edit/".length());
    Page p;
    try {
        p = Page.load(title);
    } catch (NoSuchFileException e) {
        p = new Page(title, new byte[]{});
    }

    renderTemplate(exchange, "edit", p);
}

Click here to view the code we've written so far.

Handling non-existent pages

What if you visit /view/APageThatDoesntExist You'll get no response. This is because Page.load errors. Instead, if the requested Page doesn't exist, it should redirect the client to the edit Page so the content may be created:

void viewHandler(HttpExchange exchange) throws IOException {
    var title = exchange.getRequestURI()
            .getPath()
            .substring("/view/".length());
    Page p;
    try {
        p = Page.load(title);
    } catch (NoSuchFileException _) {
        exchange.getResponseHeaders()
                .put("Location", List.of("/edit/" + title));
        HttpExchanges.sendResponse(exchange, 302, Body.empty());
        return;
    }
    renderTemplate(exchange, "view", p);
}

To do this we send an HTTP status code of 302 and a include Location header in the response.

Saving pages

The function saveHandler will handle the submission of forms located on the edit pages. To parse the form body we are going to add another library - dev.mccue.urlparameters.

import module dev.mccue.urlparameters;
void saveHandler(HttpExchange exchange) throws IOException {
    var title = exchange.getRequestURI()
            .getPath()
            .substring("/save/".length());
    var body = UrlParameters.parse(
            new String(
                    exchange.getRequestBody().readAllBytes(),
                    StandardCharsets.UTF_8
            )
    ).firstValue("body").orElseThrow();
    var p = new Page(title, body.getBytes(StandardCharsets.UTF_8));
    p.save();
    exchange.getResponseHeaders()
            .put("Location", List.of("/view/" + title));
    HttpExchanges.sendResponse(exchange, 302, Body.empty());
}

Click here to view the code we've written so far.

Etc.

There is more in the Go tutorial, including caching templates, making sure there aren't path traversal vulnerabilities (which, very important!), and some other potpourri.

But the purpose of this is just to illustrate that Java is capable of the same sort of "simple" web development that Go is known for. I'm leaving that stuff (and introducing a proper mux) as exercises for you the reader.

And just as a note: If you dig into the jdk.httpserver module you will see some "for development only" warnings around it. This is because the actual server implementation in the JDK is mostly there to serve the jwebserver tool. (jwebserver is more or less equivalent to Python's http.server) You can seamlessly swap to using a production quality server like Jetty and others by adding the appropriate dependencies.


Tell me all about your past traumas with Java in the comments below.


<- Index

Groyper Big Balls Vibe Codes away Social Security

by: Ethan McCue

"Groyper" is the term used to describe the followers of Nick Fuentes.

Nick Fuentes is a "Hitlerist," meaning he agrees with specifically the policies of Adolf Hitler. You might also know him as the "Your Body, My Choice" guy.

"Big Balls" is the name of one of the young fascists recruited by DOGE. He gained particular notoriety for being young, very punchable, and having a head turning nickname.

Another DOGE fascist is Gavin Kliger. Gavin Kliger is very much a Groyper.

They are all Groypers.

This is why when DOGE fascist Marko Elez posted "Just for the record, I was racist before it was cool" and "Normalize Indian hate" Elon Musk went out of his way to keep him on the team.

DOGE has announced plans to rewrite the computer systems of the Social Security Administration. The timeline they give for this endeavor is several months.

This will not work out for the people who are dependent on their Social Security checks. The way in which it won't work out is t.b.d., but there are a few possibilities that come to mind.

For one, they are supposedly replacing tens of millions of lines of COBOL. That is an amount of code far beyond anyone's capacity to fully understand, let alone recreate, in a few months.

So best case scenario they miss something. More likely scenario they don't care if they "miss" something. These are fascists. They are predisposed to make things worse on purpose out of condescension, intentional malice, or some other goblin emotion.

Then there is the overarching plot-line of dismantling the social safety net. The template for those sorts of actions tends to be to take over a public service, make it worse, then privatize it as a solution. This would fit that template.

The stated plan is to accomplish this with AI. The practice of using AI to produce large chunks of code has been christened by the venture capital tech space as "Vibe Coding."

Training programs have already sprung up seeking to capitalize on the term, reinforcing the convenient smuggling it gives to obvious negligence.

As was the pattern with Big Data and "Web 3" before it, the programming and tech world seems to have few antibodies to nonsensical claims of future potential. Even skepticism will end up wrapped in the framing of "Well this is obviously the future, but."

Social Security's systems are reportedly currently in COBOL, but the plan is for them to be rewritten into Java.

There would be, in a saner time, reasons for that. Java can run on what we would call "commodity hardware" and there is a statistically younger pool of trained people to pull on to write it. It wouldn't be the first system "modernized" in such a way and doing so could make longer term maintenance simpler in a few ways.

But while COBOL Mainframes have a casual reputation of being relics of the past, they are some of the only systems in the world where you can have leader + follower replication of the results of individual CPU instructions.

IBM, the "Made the Holocaust Possible" company, has an online transaction system called CICS that I have been informed is "nuclear attack resistant." I rest easy believing that, if we ever have the privilege to suffer through a Threads, my bank account will have its correct balance.

Unless they found a well-thought-out plan tucked in the desk of a more competent person they fired, these considerations will not be approached with nuance.

I mean, presuming there is a plan at all. It could just be, intentionally or not, a distraction while they pull out more important copper wiring like the FDIC.


If I'm found dead it wasn't a suicide. If I'm disappeared find and rescue me. Leave comments wherever.


<- Index

Life Altering Postgresql Patterns

by: Ethan McCue

Believe it or not, I don't think that title is clickbait.

There is a set of things that you can do when working with a Postgres database which I have found made my and my coworker's lives much more pleasant. Each one is by itself small, but in aggregate have a noticeable effect.

Use UUID primary keys

UUIDs have downsides

  • Truly random UUIDs doesn't sort well (and this has implications for indexes)
  • They take up more space than sequential ids (space being your cheapest resource)

But I've found those to be far outweighed by the upsides

  • You don't need to coordinate with the database to produce one.
  • They are safe to share externally.
CREATE TABLE person(
    id uuid not null default gen_random_uuid() primary key,
    name text not null
)

Give everything created_at and updated_at

It's not a full history, but knowing when a record was created or last changed is a useful breadcrumb when debugging. Its also something you can't retroactively get unless you were recording it.

So just always slap a created_at and updated_at on your tables. You can maintain updated_at automatically with a trigger.

CREATE TABLE person(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    name text not null
);

CREATE FUNCTION set_current_timestamp_updated_at()
    RETURNS TRIGGER AS $$
DECLARE
_new record;
BEGIN
  _new := NEW;
  _new."updated_at" = now();
RETURN _new;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER set_person_updated_at
    BEFORE UPDATE ON person
    FOR EACH ROW
    EXECUTE PROCEDURE set_current_timestamp_updated_at();

You need to create the trigger for each table, but you only need to create the function once.

on update restrict on delete restrict

When you make a foreign key constraint on a table, always mark it with on update restrict on delete restrict.

This makes it so that if you try and delete the referenced row you will get an error. Storage is cheap, recovering data is a nightmare. Better to error than do something like cascade.

CREATE TABLE person(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    name text not null
);

CREATE TABLE pet(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    name text not null,
    owner_id uuid not null references person(id)
                on update restrict
                on delete restrict
);

Use schemas

By default, every table in Postgres will go into the "public" schema. This is fine, but you are missing out if you don't take advantage of your ability to make new schemas.

Schemas work as namespaces for tables and for any moderate to large app you are going to have a lot of tables. You can do joins and have relationships between tables in different schemas so there isn't much of a downside.

CREATE SCHEMA vet;

CREATE TABLE vet.person(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    name text not null
);

CREATE TABLE vet.pet(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    name text not null,
    owner_id uuid not null references vet.person(id)
                on update restrict
                on delete restrict
);

Enum Tables

There are a lot of ways to make "enums" in sql. One is to use the actual "enum types," another is to use a check constraint.

The pattern introduced to me by Hasura was enum tables.

Have a table with some text value as a primary key and make columns in other tables reference it with a foreign key.

CREATE TABLE vet.pet_kind(
    value text not null primary key
);

INSERT INTO vet.pet_kind(value)
VALUES ('dog'), ('cat'), ('bird');

CREATE TABLE vet.pet(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    owner_id uuid not null references vet.person(id)
                on update restrict
                on delete restrict,
    kind text not null references vet.pet_kind(value)
                on update restrict
                on delete restrict
);

This way you can insert into a table to add more allowed values or attach metadata like a comment to explain what each value means.

CREATE TABLE vet.pet_kind(
    value text not null primary key,
    comment text not null default ''
);

INSERT INTO vet.pet_kind(value, comment)
VALUES 
    ('dog', 'A Canine'),
    ('cat', 'A Feline'),
    ('bird', 'A 50 Year Commitment');

CREATE TABLE vet.pet(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    owner_id uuid not null references vet.person(id)
                on update restrict
                on delete restrict,
    kind text not null references vet.pet_kind(value)
                on update restrict
                on delete restrict
);

Name your tables singularly

This isn't even Postgres specific, just please name your tables using the singular form of a noun.

SELECT * FROM pets might seem nicer than SELECT * FROM pet but the moment you start doing anything more interesting with your queries you will notice that your queries are actually working in terms of individual rows.

SELECT *
FROM pet
-- It's a cruel coincidence that in english an "s"
-- suffix can sometimes work both as a plural
-- and a possessive, but notice how the where clause
-- is asserting a condition about a single row.
WHERE pet.name = 'sally' 

The deeper you dig the more annoying edge cases you'll run into with plural table names. Just name your tables the same as what an individual row in that table represents.

Mechanically name join tables

Sometimes there are sensible names to give "join tables" - tables which form the basis for "many to many" relationships between data - but often there isn't. In those cases don't hesitate to just concatenate the names of the tables you are joining between.

CREATE TABLE vet.person(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now()
);

CREATE TABLE vet.pet(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now()
);

-- pet_owner would work in this context, but
-- I just want to demonstrate the table_a_table_b naming scheme
CREATE TABLE vet.person_pet(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    person_id uuid not null references vet.person(id)
                on update restrict
                on delete restrict,
    pet_id uuid not null references vet.pet(id)
                on update restrict
                on delete restrict
);

CREATE UNIQUE INDEX ON vet.person_pet(person_id, pet_id);

Almost always soft delete

I will reiterate that storage is cheap and recovering data is a nightmare.

If you have some domain specific need to delete (or otherwise mark as irrelevant) some data, use a nullable timestamptz column. If there is a timestamp filled in, that's when it was deleted. If there is no timestamp it isn't deleted yet.

CREATE TABLE vet.prescription(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    pet_id uuid not null references vet.pet(id)
             on update restrict
             on delete restrict,
    issued_at timestamptz not null,
    -- Instead of deleting a prescription,
    -- explicitly mark when it was revoked
    revoked_at timestamptz
);

Even outside the context of a soft delete, timestamps are usually more useful than a boolean. If you want to know whether something happened, you generally also want to know when it happened.

Represent statuses as a log

It is very tempting to represent the status of something as a single column. You submit some paperwork and it has a status of submitted. Someone starts to look at it then it transitions to in_review. From there maybe its rejected or approved.

There are two problems with this

  1. You might actually care about when it was approved, or by whom.
  2. You might receive this information out-of-order.

Webhooks are a prime example of the 2nd situation. There's no way in the laws of physics to be sure you'll get events in exactly the right order.

To handle this you should have a table where each row represents the status of the thing at a given point in time. Instead of overloading created_at or updated_at for this, have an explicit valid_at which says when that information is valid for.

CREATE TABLE vet.adoption_approval_status(
    value text not null primary key
);

INSERT INTO vet.adoption_approval_status(value)
VALUES ('submitted'), ('in_review'), ('rejected'), ('approved');

CREATE TABLE vet.adoption_approval(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    person_id uuid not null references vet.person(id)
                on update restrict
                on delete restrict,
    status text not null references vet.adoption_approval_status(value)
                on update restrict
                on delete restrict,
    valid_at timestamptz not null
);

CREATE INDEX ON vet.adoption_approval(person_id, valid_at DESC);

Just having an index on valid_at can work for a while, but eventually your queries will get too slow. There are a lot of ways to handle this, but the one we've found that works the best is to have an explicit latest column with a cheeky unique index and trigger to make sure that only the row with the newest valid_at is the latest one.

CREATE TABLE vet.adoption_approval(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    person_id uuid not null references vet.person(id)
                on update restrict
                on delete restrict,
    status text not null references vet.adoption_approval_status(value)
                on update restrict
                on delete restrict,
    valid_at timestamptz not null,
    latest boolean default false
);

CREATE INDEX ON vet.adoption_approval(person_id, valid_at DESC);

-- Conditional unique index makes sure we only have one latest
CREATE UNIQUE INDEX ON vet.adoption_approval(person_id, latest)
WHERE latest = true;

-- Then a trigger to keep latest up to date
CREATE OR REPLACE FUNCTION vet.set_adoption_approval_latest()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
    UPDATE vet.adoption_approval
    SET latest = false
    WHERE latest = true and person_id = NEW.person_id;

    UPDATE vet.adoption_approval
    SET latest = true
    WHERE id = (
        SELECT id 
        FROM vet.adoption_approval 
        WHERE person_id = NEW.person_id
        ORDER BY valid_at DESC 
        LIMIT 1
    );

    RETURN null;
END;
$function$;

CREATE TRIGGER adoption_approval_insert_trigger
    AFTER INSERT ON vet.adoption_approval
    FOR EACH ROW
    EXECUTE FUNCTION vet.set_adoption_approval_latest();

Mark special rows with a system_id

It's not uncommon to end up with "special rows." By this I mean rows in a table that the rest of your system will rely on the presence of to build up behavior.

All rows in an enum table are like this, but you will also end up with rows in tables of otherwise normal "generated during the course of normal use" rows. For these, give them a special system_id.

Unique indexes don't mind multiple rows with null values, so you can make a unique index on this system_id and look up your special rows later as you need to.

CREATE TABLE vet.contact_info(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    person_id uuid references vet.person(id)
                on update restrict
                on delete restrict,
    mailing_address text not null,
    system_id text
);

CREATE UNIQUE INDEX ON vet.contact_info(system_id);

-- Not hard to imagine wanting to build functionality that
-- automatically contacts the CDC for cases of rabies or similar,
-- but maybe every other bit of contact_info in the system is
-- for more "normal" purposes
INSERT INTO vet.contact_info(system_id, mailing_address)
VALUES ('cdc', '4770 Buford Highway, NE');

Use views sparingly

Views are amazing and terrible.

They are amazing in their ability to wrap up a relatively complex or error-prone query into something that looks basically like a table.

They are terrible in that removing obsolete columns requires a drop and recreation, which can become a nightmare when you build views on views. The query planner also seems to have trouble seeing through them in general.

So do use views, but only as many as you need and be very wary of building views on views.

CREATE TABLE vet.prescription(
    id uuid not null default gen_random_uuid() primary key,
    created_at timestamptz not null default now(),
    updated_at timestamptz not null default now(),
    pet_id uuid not null references vet.pet(id)
             on update restrict
             on delete restrict,
    issued_at timestamptz not null,
    -- Instead of deleting a prescription,
    -- explicitly mark when it was revoked
    revoked_at timestamptz
);

CREATE INDEX ON vet.prescription(revoked_at);

-- There are pros and cons to having this view
CREATE VIEW vet.active_prescription AS
    SELECT
        vet.prescription.id,
        vet.prescription.created_at,
        vet.prescription.updated_at,
        vet.prescription.pet_id,
        vet.prescription.issued_at
    FROM
        vet.prescription
    WHERE 
        vet.prescription.revoked_at IS NULL;

JSON Queries

You might have heard that Postgres "supports JSON." This is true, but I had mostly heard it in the context of storing and querying JSON. If you want a table with some blob of info slap a jsonb column on one your tables.

That is neat, but I've gotten way more mileage out of using JSON as the result of a query. This has definite downsides like losing type information, needing to realize your results all at once, and the overhead of writing into json.

But the giant upside is that you can get all the information you want from the database in one trip, no cartesian product nightmares or N+1 problems in sight.

SELECT jsonb_build_object(
  'id', vet.person.id,
  'name', vet.person.name,
  'pets', array(
    SELECT jsonb_build_object(
      'id', vet.pet.id,
      'name', vet.pet.name,
      'prescriptions', array(
        SELECT jsonb_build_object(
          'issued_at', vet.prescription.issued_at
        )
        FROM vet.prescription
        WHERE vet.prescription.pet_id = vet.pet.id
      )
    )
    FROM vet.person_pet
    LEFT JOIN vet.pet 
      ON vet.pet.id = vet.person_pet.pet_id
    WHERE vet.person_pet.person_id = vet.person.id
  ),
  'contact_infos', array(
    SELECT jsonb_build_object(
      'mailing_address', vet.contact_info.mailing_address
    )
    FROM vet.contact_info
    WHERE vet.contact_info.person_id = vet.person.id
  )
) 
FROM vet.person
WHERE id = '29168a93-cd14-478f-8c70-a2b7a782c714';

Which can net you something like the following.

{
  "id": "29168a93-cd14-478f-8c70-a2b7a782c714",
  "name": "Jeff Computers",
  "pets": [
    {
      "id": "3e5557c0-c628-44ef-b4d1-86012c5f48bf",
      "name": "Rhodie",
      "prescriptions": [
        {
          "issued_at": "2025-03-11T23:46:18.345146+00:00"
        }
      ]
    },
    {
      "id": "ed63ca7d-3368-4353-9747-6b6b2fa6657a",
      "name": "Jenny",
      "prescriptions": []
    }
  ],
  "contact_infos": [
    {
      "mailing_address": "123 Sesame St."
    }
  ]
}

You can find all the setup you'd need to do for that query here. You can try it out on https://onecompiler.com/postgresql if setting up a local postgres is a bit much.


If there is something I missed or got wrong, tell me very loudly in person or here on the internet.


<- Index

New build tool in Java!

by: Ethan McCue

The title is a ? -> ! of this recent thread on Reddit and really a continuation of this series of posts (1, 2, 3, 4).

So first I am going to go over a few of the Maven/Gradle alternatives listed in that Reddit thread and explain why they don't really scratch the itch I think we need scratched, give a small status update on what I've been working on, and end with something that you the reader (yes, you!) could help me with.

I am also going to be terse, biased, and perhaps a bit too harsh. Otherwise, this would drag on and not be an entertaining read.

What do I want?

I want there to be a smooth on-ramp from java src/Main.java to running that program with dependencies, packaging it up to share with other people, and making use of the tools that come with Java (jlink, jpackage, etc.) and available in the Java ecosystem (junit, mapstruct, etc.) Ideally there should be a way to split dependencies over the --class-path, --module-path, and all the other paths as well.

Importantly I don't care, and I think the people who most need this on-ramp don't care, about maximal efficiency. Whoopty doo basil, you can incrementally compile a 100 million line codebase. Fart noises.

Ant

Easy one out-of-the-way first, Ant. Ant is a cross-platform scripting language with targets. It's just but you write XML.

By itself it doesn't handle dependencies at all and outsources that to Ivy. This is a second XML file and, once you've filled it out, it downloads jars to a folder.

Even if you are an Ant-head there is a problem in that to use Ivy you are expected to be using Ant. Ant makes sense if you start by compiling code with javac, but that's no longer a thing you need to do. So the path from java src/Main.java isn't great.

One thing Ant has going for it is that it is clear how to use built-in tools. It loses points in my book though because the arguments you'd give to its tasks don't match up that closely to the arguments you would pass to the command line tools. For example, compare the javac task to the man page for javac itself.

Ivy also dumps jars into a single folder. This means it isn't exactly easy to put a few dependencies on the --class-path and others on the --module-path.

Scorecard

  • 🔴 Clear path from java src/Main.java
  • 🟡 Clear path to making use of other tools
  • 🔴 Split dependencies across different paths

Mill

Mill is a Scala build tool. Scala is a programming language with a history of breaking changes and higher kinded types.

This is the example Mill program for a "simple Java program."

package build
import mill._, javalib._

object foo extends JavaModule {
  def ivyDeps = Agg(
    ivy"net.sourceforge.argparse4j:argparse4j:0.9.0",
    ivy"org.thymeleaf:thymeleaf:3.1.1.RELEASE"
  )

  object test extends JavaTests with TestModule.Junit4 {
    def ivyDeps = super.ivyDeps() ++ Agg(
      ivy"com.google.guava:guava:33.3.0-jre"
    )
  }
}

So very similarly to Ant, this jumps straight into building code. To understand what is happening here you need to understand Scala - an entirely separate programming language from Java. It uses a Task monad, which is like a burrito, to track what tasks depend on other tasks.

It is also very fast, a property I will reiterate I do not care about.

For built-in tools Mill bundles modules, like its JlinkModule, which implicitly give access to the functionality provided. This has a very similar problem to maven plugins in that the distance between "what command is actually run and when" and "what you actually specify" is relatively large.

Let's say you wanted to compile some .xsd files into Java classes with xjc. Perhaps a little niche nowadays, but how would you do it? Are you falling back to ProcessBuilder? Are you learning how the Task monad works?

As for splitting things over multiple paths, a quick look at that JlinkModule shows that its just reusing the --class-path for both the --class-path and --module-path.

val classPath = jars.map(_.toString).mkString(sys.props("path.separator"))
val args = {
  val baseArgs = Seq(
    Jvm.jdkTool("jmod", this.zincWorker().javaHome().map(_.path)),
    "create",
    "--class-path",
    classPath.toString,
    "--main-class",
    mainClass,
    "--module-path",
    classPath.toString,
    outputPath.toString
  )

  val versionArgs = jlinkModuleVersion().toSeq.flatMap { version =>
    Seq("--module-version", version)
  }

  baseArgs ++ versionArgs
}

Scorecard

  • 🔴 Clear path from java src/Main.java
  • 🔴 Clear path to making use of other tools
  • 🔴 Split dependencies across different paths

bld

bld is the build tool from the Rife2 people. I've written promo material for it before and like to think appreciate it for what it is. That being said -

One problem is that it copies the maven directory layout and, to an extent, conceptual model. The education path gap issues that exist with Maven mandating src/Main.java -> src/main/java/Main.java are equally present. Same goes for the artifact focus, scopes, etc.

It very much is a build tool and, while that is in its tech-startup-de-voweled name, it's not the aspect we care about.

// User model is pretty similar to maven, just with Java instead of XML
public class MyAppBuild extends Project {
    public MyAppBuild() {
        pkg = "com.example";
        name = "my-app";
        mainClass = "com.example.MyApp";
        version = version(0,1,0);

        downloadSources = true;
        repositories = List.of(MAVEN_CENTRAL, RIFE2_RELEASES);
        scope(test)
            .include(dependency("org.junit.jupiter", "junit-jupiter", version(5,11,4)))
            .include(dependency("org.junit.platform", "junit-platform-console-standalone", version(1,11,4)));
    }

    public static void main(String[] args) {
        new MyAppBuild().start(args);
    }
}

But unlike Mill there is no overarching Task abstraction. bld Operations are almost entirely ignorable if you want to run a tool separately. I'd count that as a clear path. Sure, you're using ProcessBuilder or AbstractProcessOperation, but at least you can "just" call the xjcs of the world and move on with your life.

Like Ivy, dependencies are dumped as just jars in folders. Unlike Ivy, there is explicit support for modules. You can opt to have a dependency be put on the module path and that just gets put in a different folder.

public class ProjectBuild extends Project {
    public ProjectBuild() {
        pkg = "project";
        name = "project";
        mainClass = "example.Main";
        version = version(0,1,0);

        downloadSources = true;
        repositories = List.of(MAVEN_CENTRAL, RIFE2_RELEASES);
        scope(compile)
                // Will be on the --class-path
                .include(dependency("commons-io", "commons-io", "2.18.0"))
                // Will be on the --module-path
                .include(module("com.fasterxml.jackson.core", "jackson-databind", "2.18.3"));

        // Takes scopes from maven too
        scope(test)
            .include(dependency("org.junit.jupiter", "junit-jupiter", version(5,11,4)))
            .include(dependency("org.junit.platform", "junit-platform-console-standalone", version(1,11,4)));
    }

    public static void main(String[] args) {
        new ProjectBuild().start(args);
    }
}

But just because it does better than the other entries on this list doesn't mean I'll give it full points. It still doesn't have an obvious way of filling in any of the other paths you might want when executing a tool such as --processor-module-path, --module-path, -Dsystem.library.path, etc. You can at least get at the command line options (and the JaCoCo extension makes use of that) but it's not trivial to use fill them in via dependency resolution.

Scorecard

  • 🔴 Clear path from java src/Main.java
  • 🟢 Clear path to making use of other tools
  • 🟡 Split dependencies across different paths

bach

Bach is a build tool built entirely around the assumption that you will be using Java modules. This means the path from java src/Main.java has to include a pit-stop where you add a module-info.java and put your code in a package like java src/somepackage/Main.java. That's at least a path, but it's not the best one.

Unlike the others on the list there is a focus explicitly on the JDK tools. So running different CLI tools is pretty directly supported.

It more or less abdicates getting dependencies to other tools, so it sorta just doesn't solve the core problem I care about. It doesn't have a solution that gets in the way, but it doesn't have a solution at all.

Also bach is lucky I'm not grading on documentation because "what documentation?" It's still a WIP from the author, and it's not like I do any better, but still.

Scorecard (Disqualified)

  • 🟡 Clear path from java src/Main.java
  • 🟢 Clear path to making use of other tools
  • 🔴 Split dependencies across different paths

pottery

So pottery - it's another "maven, but." This time the "but" is that it has a set of defaults more suited to the author's preferences and it uses a yaml file for config.

parameters:
  junit.version: "5.9.1"
  snakeyaml.version: "1.33"
  chalk.version: "1.0.2"
  picocli.version: "4.7.0"
  junit.platform.version: "1.9.0"
  junit.engine.version: "5.9.1"

artifact:
  group:    "cat.pottery"
  id:       "pottery"
  version:  "0.3.2"

  platform:
    version: "17"
    produces: "fatjar"

  manifest:
    main-class: "cat.pottery.ui.cli.Bootstrap"

  dependencies:
    - production: "org.yaml:snakeyaml:${snakeyaml.version}"
    - production: "com.github.tomas-langer:chalk:${chalk.version}"
    - production: "info.picocli:picocli:${picocli.version}"
    - production: "org.junit.platform:junit-platform-launcher:${junit.platform.version}"
    - production: "org.junit.jupiter:junit-jupiter-engine:${junit.engine.version}"
    - test: "org.junit.jupiter:junit-jupiter-api:${junit.version}"

You can make an uberjar, docker image, and native image out of the box. You can't put things on the --module-path though. Its also pretty tightly bundled so its unclear where you would put a hypothetical call to xjc. Its also taking over everything wholesale, so no easy path from java src/Main.java

Scorecard

  • 🔴 Clear path from java src/Main.java
  • 🔴 Clear path to making use of other tools
  • 🔴 Split dependencies across different paths

jbang

jbang got extremely lucky that nobody was already using the name for a porn site. That's worth celebrating.

The way jbang works is that you put a comment at the top of a file with the dependencies you want, then if you launch the program with jbang it will automatically download those dependencies and use them.

///usr/bin/env jbang "$0" "$@" ; exit $?
//DEPS ch.qos.reload4j:reload4j:1.2.19

import static java.lang.System.out;

import org.apache.log4j.Logger;
import org.apache.log4j.BasicConfigurator;

import java.util.Arrays;

class classpath_example {

    static final Logger logger = Logger.getLogger(classpath_example.class);

    public static void main(String[] args) {
        BasicConfigurator.configure(); 
        logger.info("Welcome to jbang");

        Arrays.asList(args).forEach(arg -> logger.warn("arg: " + arg));
        logger.info("Hello from Java!");
    }
}

So going from java src/Main.java to jbang src/Main.java is no trouble at all. Gold star.

Problems start when you want to put something on the --module-path. Unless I missed something in the documentation (which is possible) that is not possible. It has special carve-out support for JavaFX, but it's not doable in general.

If you want to use these dependencies in other tooling, like jlink, you are equally out of luck.

  • 🟢 Clear path from java src/Main.java
  • 🔴 Clear path to making use of other tools
  • 🔴 Split dependencies across different paths

java-jpm

java-jpm is kinda what it says it is - it takes the same approach as npm and just downloads dependencies into a folder. Well it downloads dependencies then symlinks them into a folder but spiritually the same thing.

It just dumps into one folder, so no support for different paths. java src/Main.java -> java -cp deps/* src/Main.java is pretty straight forward as well.

Where I have to dock it points is in the platform-specificness. -cp deps/* isn't something that will work consistently across bash, powershell, fish, etc. Maybe that's unfair, but it's a real problem and none of the other options have it on account of making building CLI arguments outside a specific shell.

  • 🟡 Clear path from java src/Main.java
  • 🟡 Clear path to making use of other tools
  • 🔴 Split dependencies across different paths

What I've been working on

So for a while now I've had the jresolve tool in my back pocket. It still has much the same deficiencies it had when I first shared it, sans a few bug fixes.

The tl;dr is that you could run

jresolve pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.18.3

And it would print to standard output a path with all the dependencies on it, separated by the requisite platform specific path separator.

/Users/emccue/.jresolve/cache/https/repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.18.3/jackson-databind-2.18.3.jar:/Users/emccue/.jresolve/cache/https/repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.18.3/jackson-core-2.18.3.jar:/Users/emccue/.jresolve/cache/https/repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.18.3/jackson-annotations-2.18.3.jar

Nesting commands is very platform specific

# only works in bash
java -cp $(jresolve pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.18.3) src/Main.java

So I first added the --output-file argument, which would let you dump the path to a file without anything overly platform specific. You could then use the "argfile" syntax - @filename - to use the resolved path in future commands.

jresolve --output-file libs pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.18.3
java -cp @libs src/Main.java

But what became clear very quickly is that I had no clue how to teach an IDE how to read dependencies from a file with a path in it.

So that's when I added --output-directory. You can at least tell an IDE to use all the jars in a folder.

jresolve --output-directory libs pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.18.3
java -cp libs/* src/Main.java

But, as is the issue with java-jpm, the libs/* syntax is platform specific. It was then I started becoming biased towards using the --module-path. That at least you could just point to a folder.

jresolve --output-directory libs pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.18.3
java --module-path libs --add-modules ALL-MODULE-PATH src/Main.java

And when the list of dependencies got too big I could put them in an argfile of their own.

jresolve --output-directory libs @libs.txt
java --module-path libs --add-modules ALL-MODULE-PATH src/Main.java

And that's where jresolve sat for a bit over a year. I'd keep using it for one-off projects and bashing my head against the "--module-path or --class-path, pick one" problem every time I did.

Even in that state you can bootstrap build programs and I wrote libraries to support that goal. With a tasteful enough use of argfiles and picocli you could get it down to

jresolve @bootstrap
java @project compile
java @project test

And that felt good, but problems remain. For one, the split path issue. For two, those libs.txt files kinda suck. It got me thinking about how Python started with requirements.txt and eventually that mutated into pyproject.toml.

So that's where I went with it. In the newest version of jresolve if you run jresolve install it will look for a jproject.toml

[project]
defaultUsage="--class-path"

[[project.dependencies]]
coordinate="pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.18.2"

[[project.dependencies]]
coordinate="pkg:maven/org.junit.jupiter/junit-jupiter-api@5.11.4"
dependencySets=["test"]

[project.dependencySets.test]
extends="default"

And this will dump out argfiles in a predictable structure. So with the jproject.toml above you will end up with

dependencySets/
  default 
  test

Where each argument file has not only the paths, but the --class-path preceding them. This finally gives a place to specify "this dependency goes on the --module-path."

[project]
defaultUsage="--class-path"

[[project.dependencies]]
coordinate="pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.18.2"

[[project.dependencies]]
coordinate="pkg:maven/commons-io/commons-io@2.18.0"
usage="--module-path"

[[project.dependencies]]
coordinate="pkg:maven/org.junit.jupiter/junit-jupiter-api@5.11.4"
dependencySets=["test"]

[project.dependencySets.test]
extends="default"

It would even let you do exotic things like put libraries on the -Dsystem.library.path

[project]
defaultUsage="--class-path"

[[project.dependencies]]
coordinate="pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.18.2"

[[project.dependencies]]
coordinate="pkg:maven/commons-io/commons-io@2.18.0"
usage="--module-path"

[[project.dependencies]]
coordinate="SDL/build"
usage="-Dsystem.library.path"

[[project.dependencies]]
coordinate="pkg:maven/org.junit.jupiter/junit-jupiter-api@5.11.4"
dependencySets=["test"]

[project.dependencySets.test]
extends="default"

And running your program goes from java src/Main.java to java @dependencySets/default src/Main.java, perhaps with a --add-modules ALL-MODULE-PATH.

Now all I wish is that the java launcher itself accepted "nested" argfiles. If it did, we could put @dependencySets/default src/Main.java into a file named run and get java @run. Or have a dependency set for your build program and get java @project build. In that last situation you could slot in whatever build program you want or need. But that's where I am going to pause for the moment.

If you want to try any of that out you can run this command on Mac/Linux to get the tool.

bash < <(curl -s  https://raw.githubusercontent.com/bowbahdoe/jresolve-cli/main/install)

If you are on Windows then you can download the jar from the latest release and use java -jar jresolve.jar.

What I need help with

I still have no clue how to teach an IDE how to read dependencies from an argument file. I've had several false starts on making an IntelliJ plugin to do so. The closest I've gotten is with the Oracle extension for VSCode, but that has issues of its own

If you or someone you know has the knowledge required to pull that off please do.

Separately, if you can think of a convincing reason for the java launcher to support nested argfiles beyond it just making me happy file an issue in support of that. Also, there are tools like jshell which don't support argfiles for no reason in particular.

And one thing that seems to have happened in the Python world after the introduction of pyproject.toml is that a whole universe of tools started looking in that file for their own config. I don't know which is the chicken and which is the egg, but whatever goes in a jproject.toml it might be nice for a variety of things to use it for their config.


I also wrote this in a daze on a Sunday afternoon, there's likely things I missed or described wrong. I'll try to respond to comments below or wherever it ends up being shared.

EDIT: I've made some edits to this since I originally shared it. The older, somewhat meaner, text is on the wayback machine.


<- Index

The Ultimate Guide to Data Structures and Algorithms (DSA)

by: Ethan McCue

This guide has two sections, one for employers and one for prospective employees. We'll start with employees.

For Employees

Step 1.

Obtain a copy of "Algorithm Design" by Jon Kleinberg and Éva Tardos. Any edition is fine.

Step 2.

Follow structured coursework on that book or otherwise go through it in a way that you can manage. Slowly work your way through while living your life.

Here are some slides and here is an edX course from standford.

If this does not satisfy you, seek out a book of similar character.

Step 3.

Branch out. I understand why you are stressed about "Data Structures and Algorithms," but I guarantee that time spent elsewhere is going to be more valuable for you.

If you've done step 1 and 2 you have enough background info to begin to tackle whatever hard problems you run into in practice, focus instead on getting practical experience building things.

It doesn't matter if those things are trivial either: hastily writing a jank calorie tracker website or virtual shrine to Edward Cullen will provide more long term value to you than spending your evenings grinding Leetcode.

Only do "Competitive Programming" if it is truly an interest of yours. It's fine if it is, but you don't need to do that kind of stuff to be qualified to perform most software jobs. Even the "prestigious" ones.

For Employers

Step 1.

Please, for the love of god, cut the shit.

If the position you are hiring someone for does not require implementing or understanding Gale-Shapley or reversing a linked list, do not make testing someone on that part of the interview process.

I do not care if you fancy yourself an "elite institution" and want to "uphold standards." It's the TSA of hiring practices. All theater, no usefulness. When people know that's what they'll be tested on they spend time practicing for the test and not their actual responsibilities. This helps no one.

Step 2.

Rework your hiring processes to be more holistic. This will cost time and energy, but it will cost orders of magnitude less time and energy than having employees who can balance a trie but store passwords in plaintext.

If you don't know where to start, give your technical interviewers this mind map and the explanation following it. PDF link here.

Mind map of topics to cover in an interview

It's mostly a memory aid for making sure I cover topics in an interview but it's also an ordered guide. I start at the top right and work my way down that side. I drill into the branches if a) the candidate is enthusiastic about a topic or b) they are very reticent. Once I get down the right hand side (mostly "soft" skills stuff), I move to the top left and work my way down that side (more technical/process skills stuff). I skip any topic they've already covered. I print out a fresh map for each candidate and make brief notes around the edge of the page.

The "conflict resolution" branch tends to come up in the "worst project" area but it's there as a reminder to make sure it's covered if they haven't mentioned it elsewhere.

Overall, it was written as a general guide, not specific to any particular programming language, so you have to play it by ear somewhat if you're interviewing for a senior FP role and you're only interested in FP, for example, or if you're interviewing for, say, a scrum master, or an ops role -- anything really specialized.

Prior to the interview, I'll also use the map as a guide for highlighting things on their resume/CV that I want to dig into during the interview -- I may highlight parts of the resume or add "pre-notes" to their map, and use both side-by-side.

I try to couch all of it in "tell me about ..." open-ended questions and avoid quizzing them on specific technology as much as possible. If they don't mention some of the specific tech organically that I want to hear about, I will "guide" them back to that at appropriate points.

The diagram and explanation comes from Sean Corfield. He does not know I am writing this or quoting him.

Step 3.

Stop paying Leetcode and companies like them for candidates. It is in their best interests to feed slop into the slop trough. Do not gobble their slop.


<- Index

How to use SDL3 from Java

by: Ethan McCue

Using native code libraries from Java is both easier than its ever been and still a little bit frustrating.

To understand that duality I wrote this pretty basic tutorial on how to use the newly-ish released SDL3 from Java.

This should be useful both for those invested in the Java ecosystem and those who have a more practical desire to use SDL3 for their own projects. While all I am going to do will be focused on Java, feel free to generalize to Clojure, Kotlin, Scala, or flix1.

Prerequisites

  • Git
  • SDKMan (curl -s "https://get.sdkman.io" | bash)
  • JExtract (sdk install jextract)
  • Java 22+
  • Linux-like system. (Just to make this easier for me to write and test.)
  • Just (Or just run the commands. I like using this.)

Tutorial

0. Make a Hello World project

src/
    Main.java
public class Main {
    public static void main(String[] args) {
        System.out.println("Hello, world");
    }
}

As part of this, make a Justfile with a recipe to run the project.

help:
    just --list
    
run:
    java src/Main.java

1. Clone and Build SDL

SDL is one of those dependencies you still get from source. There are platform specific ways to get this and other native libraries, but those are all nightmares in their own right.

You can find the build instructions for your platform here.

help:
    just --list

# Clone and Build SDL    
sdl:
    rm -rf SDL
    git clone https://github.com/libsdl-org/SDL
    cd SDL && mkdir build
    cd SDL/build && cmake -DCMAKE_BUILD_TYPE=Release ..
    cd SDL/build && cmake --build . --config Release --parallel
    cd SDL/build && sudo cmake --install . --config Release
    
run:
    java src/Main.java

Whether you want to keep SDL as a git submodule, clone it fresh every time, or something else is up to you.

On my machine (M1 Mac) running the build outputs some warnings which leads to a non-zero exit code. While annoying, it does manage to finish the build so whatever. So long as you don't see errors you should be fine.

2. Generate Java Bindings

To do this we will use jextract.

jextract \
      --include-dir SDL/include \
      --dump-includes includes.txt \
      SDL/include/SDL3/SDL.h

This will create a file with the command-line flags needed to include every symbol. This is useful so you can trim down the functions Java code will be generated for. Basically just go through this file and remove everything that doesn't start with SDL_ or similar.

Then, using that list of symbols to include, generate Java code.

As part of this you should use the --use-system-load-library flag. This will generate the code such that it will pull libsdl3 from the directly configurable java.library.path.

jextract \
      --include-dir SDL/include \
      --output src \
      --target-package bindings.sdl \
      --library SDL3 \
      --use-system-load-library \
      @includes.txt \
      SDL/include/SDL3/SDL.h

3. Update Run Configuration

In order to call into native code you need to pass a flag to enable native access. This is because, in general, calling arbitrary C code can crash or otherwise bork the JVM.

Native access permissions are given per-module. By default (unless you make a module-info.java) your code will be on the unnamed module, so we will use ALL-UNNAMED.

java --enable-native-access=ALL-UNNAMED src/Main.java

A known quirk of using a library like SDL on Mac is that you need to also pass -XstartOnFirstThread. On non-mac platforms I think you can leave this off.

java \
    -XstartOnFirstThread \
    --enable-native-access=ALL-UNNAMED \
    src/Main.java

And then we need to pass the path of our build SDL shared library.

java \
    -XstartOnFirstThread \
    --enable-native-access=ALL-UNNAMED \
    -Djava.library.path=SDL/build \
    src/Main.java

If you are unfamiliar with -Djava.library.path - isn't that crazy? Consequence of build tools only caring about --class-path I think.

4. Make some calls to SDL

The following is translated from one of the SDL examples. Note that while the C example its based on has some callbacks, in Java you need to manually implement those lifecycle bits.

Also note the uses of try/finally. One big difference between C and Java is that Java has exceptions. If you want cleanup code to always run (such as SDL_DestroyWindow) you need to account for exceptions.

import bindings.sdl.SDL_Event;
import bindings.sdl.SDL_FPoint;
import bindings.sdl.SDL_FRect;

import java.lang.foreign.Arena;

import static bindings.sdl.SDL_h.*;

public class Main {
    public static void main(String[] args) {
        try (var arena = Arena.ofConfined()) {
            SDL_SetAppMetadata(
                    arena.allocateFrom("Example Renderer Primitives"),
                    arena.allocateFrom("1.0"),
                    arena.allocateFrom("com.example.renderer-primitives")
            );

            if (!SDL_Init(SDL_INIT_VIDEO())) {
                System.err.println(
                        "Couldn't initialize SDL: "
                                + SDL_GetError().getString(0));
                return;
            }


            var windowPtr = arena.allocate(C_POINTER);
            var rendererPtr = arena.allocate(C_POINTER);
            if (!SDL_CreateWindowAndRenderer(
                    arena.allocateFrom("examples/renderer/clear"),
                    640,
                    480,
                    0,
                    windowPtr,
                    rendererPtr
            )) {
                System.err.println(
                        "Couldn't create window/renderer: "
                                + SDL_GetError().getString(0));
                return;
            }

            var window = windowPtr.get(C_POINTER, 0);
            var renderer = rendererPtr.get(C_POINTER, 0);
            try {

                int numberOfPoints = 500;
                var points = SDL_FPoint.allocateArray(numberOfPoints, arena);
                for (int i = 0; i < numberOfPoints; i++) {
                    var point = SDL_FPoint.asSlice(points, i);
                    SDL_FPoint.x(
                            point,
                            (SDL_randf() * 440.0f) + 100.0f
                    );
                    SDL_FPoint.y(
                            point,
                            (SDL_randf() * 280.0f) + 100.0f
                    );
                }

                var event = SDL_Event.allocate(arena);
                var rect = SDL_FRect.allocate(arena);

                program:
                while (true) {
                    while (SDL_PollEvent(event)) {
                        var type = SDL_Event.type(event);
                        if (type == SDL_EVENT_QUIT()) {
                            System.err.println("Quitting");
                            break program;
                        }
                    }

                    /* as you can see from this, rendering draws over whatever was drawn before it. */
                    SDL_SetRenderDrawColor(
                            renderer,
                            (byte) 33, (byte) 33, (byte) 33, (byte) SDL_ALPHA_OPAQUE()
                    );  /* dark gray, full alpha */
                    SDL_RenderClear(renderer);  /* start with a blank canvas. */

                    /* draw a filled rectangle in the middle of the canvas. */
                    SDL_SetRenderDrawColor(
                            renderer,
                            (byte) 0, (byte) 0, (byte) 255, (byte) SDL_ALPHA_OPAQUE()
                    );  /* blue, full alpha */
                    SDL_FRect.x(rect, 100);
                    SDL_FRect.y(rect, 100);
                    SDL_FRect.w(rect, 440);
                    SDL_FRect.h(rect, 280);

                    SDL_RenderFillRect(renderer, rect);

                    /* draw some points across the canvas. */
                    SDL_SetRenderDrawColor(
                            renderer,
                            (byte) 255, (byte) 0, (byte) 0, (byte) SDL_ALPHA_OPAQUE()
                    );  /* red, full alpha */
                    SDL_RenderPoints(renderer, points, numberOfPoints);

                    /* draw a unfilled rectangle in-set a little bit. */
                    SDL_SetRenderDrawColor(
                            renderer,
                            (byte) 0, (byte) 255, (byte) 0, (byte) SDL_ALPHA_OPAQUE()
                    );  /* green, full alpha */
                    SDL_FRect.x(
                            rect,
                            SDL_FRect.x(rect) + 30
                    );
                    SDL_FRect.y(
                            rect,
                            SDL_FRect.y(rect) + 30
                    );
                    SDL_FRect.w(
                            rect,
                            SDL_FRect.w(rect) - 60
                    );
                    SDL_FRect.h(
                            rect,
                            SDL_FRect.h(rect) - 60
                    );
                    SDL_RenderRect(renderer, rect);

                    /* draw two lines in an X across the whole canvas. */
                    SDL_SetRenderDrawColor(
                            renderer,
                            (byte) 255, (byte) 255, (byte) 0, (byte) SDL_ALPHA_OPAQUE()
                    );  /* yellow, full alpha */
                    SDL_RenderLine(renderer, 0, 0, 640, 480);
                    SDL_RenderLine(renderer, 0, 480, 640, 0);

                    SDL_RenderPresent(renderer);  /* put it all on the screen! */
                }

            } finally {
                SDL_DestroyRenderer(renderer);
                SDL_DestroyWindow(window);
                SDL_Quit();
            }
        }
    }
}

Run the code with the flags discussed above. You should see a window pop up with a rectangle and some dots.

Annoyances

So while this is easy from top to bottom, there are some interesting properties you should be aware of.

1. SDL_h doesn't actually have all the functions.

I assume because of limits on class size, with a library the size of SDL the jextract-generated binding code is split over multiple files. SDL_h, SDL_h_1, SDL_h_2, etc. This isn't an issue normally since you can just add more static imports, but it can be an issue for binary compatibility. If you end up directly accessing the static properties of SDL_h_2 you might be in for a bad surprise if those symbols end up in a different class file when you next update.

It's not a problem when the generated binding code is just part of your build, but it is an issue if you wanted to make a stable sdl artifact to share with other people.

2. The generated Java code is per-platform.

Java doesn't actually have a C api - it has a "foreign function and memory" api. This means that the descriptions of native memory layouts include platform specific padding. jextract uses clang to figure out what the memory layouts for structs are and dumps those as part of its generated code.

This means that if you want to use jextract generated code across different platforms you need to either make distinct artifacts per-platform or handle things dynamically at runtime.

An easy way to get access to different platforms (which you should think of as target triples - (operating system, architecture, libc)) is via GitHub actions. Exercise for the reader on how to integrate that, though I have one example.

3. It would be work to share this

Distributing a library which uses C code over the usual Java library channels like Maven Central can be annoying. The standard build tools (maven, gradle, etc.) do not provide an easy way to set java.library.path or get dependencies that should go there. What most people have historically done is to embed one or multiple shared libraries in a jar and extract them at runtime.2

This is some bunk, but kinda the lowest common denominator approach. You can read some of that nonsense here.

The binary compatibility issues alluded to above would also be something to consider.

All that is to say - don't go just publishing libraries for every C library you want bindings for. At least at the moment there are some caveats and the ecosystem isn't super ready for it. If you are going to do that, provide a layer on top of the auto-generated jextract code. Have some value that is worth the time investment.

4. You need to care about memory lifetimes

In the example code there is only one Arena and it lives for the entire program. If you have a need to allocate memory for a shorter timespan you'll need to make at least one allocator. While this is better than directly dealing with malloc/free it is still more responsibility than you usually have in Java code.

It can be tempting to build APIs that use foreign memory the same as if they did not, but unless you have a really clear seam behind which to hide the memory shenanigans it is probably going to backfire.

Conclusion

If you want to make a game or game engine, this should be a good start. You can pivot to more C oriented SDL tutorials and translate the calls needed to open windows, render graphics, etc.

If you want to do something similar for another native library, this should serve as a decent starting point.

You can find the code for this demo here.

1: Sidenote, but I think flix could kill Scala as the JVM language for Haskell-likin-types. Other than implicits + some basic type level stuff: Java has already taken or will take most of what made Scala interesting. "Stratified Negation," "Lattice Semantics," and "Associated Effects" intimidate me in a way I haven't felt in a while.

2: There is one method of distribution that doesn't have these problems: .jmod. JMods have a special place for shared libraries and will merge them in to a JDK it's linked with. I'm investigating the possibilities of that on the side.


<- Index

Java Build Scripts

by: Ethan McCue

I've written before about how I think that while they need some bolstering (here) using the CLI tools to build Java code is more practical than you might think (here and here).

What I didn't talk about, or tip-toed around, is that writing build scripts in bash, PowerShell, cmd.exe, etc. is not very cross-platform.

You can install bash on Windows and run in WSL, but that feels unideal. An extra setup step is one thing, but needing to ask students who just learned that the command line exists to also make sure they aren't accidentally running in PowerShell is painful.

You could also just ignore the problem. "Real" developers use Mac or Linux, right? Well those same developers sometimes pick an extra special shell for themselves like zsh, nushell, or fish. You have a similar, if less serious, problem.

Using any of those shells for experimenting or testing out commands is fine. Until you write $() or a file path they are all more or less the same. What we really need is some way of writing out commands that will work on Windows, Mac, and Linux and regardless of if someone is using bash, Powershell, cmd.exe, zsh, nushell, or fish.

If only we had a language that could be written once and then ran anywhere.

just

Since Java 22 we've been able to write java Main.java and execute a potentially multi-file program. Before that there was a significant bootstrap problem. If you write Java code to compile Java code, who compiles that Java code?

Now that that's there, we can start to consider what it would look like to run a CLI tool from Java code and compare that to the alternatives.

The alternative I've been using is just. just is a command runner similar to make but without any of the caching make does or the wild syntax and history make is burdened with.

You write the name of a "recipe", a : then an indented list of commands to run.

demo:
    javac --version
    jlink --version

For this, if you run just demo and then it will run each command in sequence.

$ just demo
javac --version
javac 22.0.1
jlink --version
22.0.1

If any command gives you a non-zero exit code it fails immediately.

javac --v
error: invalid flag: --v
Usage: javac <options> <source files>
use --help for a list of possible options

And, by default, it will echo the command its about to run before it runs it.

All of these properties are useful for different reasons.

  • Printing the command before its run is useful when something fails. You can usually just copy-paste the command and tweak it until it works. Then you can just copy-paste the working command back in place.
  • Failing immediately on a non-zero exit code is a good default for what I feel are obvious reasons.
  • Being able to refer to a command or group of commands by an alias is almost required to do anything interesting. just compile is much more ergonomic to use than javac -d build --module-source-path "./*/src --module example.

Also, and it feels small but isn't, you can get a list of all the commands + a comment on how to use them with just --list.

$ just --list
Available recipes:
    demo

Run commands in Java

I want all of these properties so let's see how we can get them in Java.

To run a command we can use the ProcessBuilder.

import java.util.List;

public class Project {
    public static void main(String[] args) {
        var cmd = List.of("javac", "--version");

        var pb = new ProcessBuilder(cmd);
    }
}

Then all we need to do is start the command, wait for it to finish, and record the exit status. If its non-zero throw.

public class Project {
    public static void main(String[] args) {
        var cmd = List.of("javac", "--version");

        var pb = new ProcessBuilder(cmd);
        try {
            int exitStatus = pb.inheritIO().start().waitFor();
            if (exitStatus != 0) {
                throw new RuntimeException(
                        "Non-zero exit status: " + exitStatus
                );
            }
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
    }
}

Yeesh. So that sucks. We haven't even gotten to printing out the command or labelling groups of commands yet.

🚧Construction Zone 🚧

I've been noodling on this for a time and this weekend2 I think I finally came up with a half-decent API.

import dev.mccue.tools.ExitStatusException;
import dev.mccue.tools.Tool;

public class Project {
    public static void main(String[] args)
            throws ExitStatusException {
        Tool.ofSubprocess("javac")
                .run("--version");
    }
}

This will print out the command to System.err, run it, throw an exception if needed, and pipe output to System.out/System.err as needed.

Problem is that now we've introduced a dependency.

Hand-waving how you get the dependency1, the command you need to run the code changes from java scripts/Main.java to java --module-path scripts/libs --add-modules ALL-MODULE-PATH scripts/Main.java.

That's simply too much to remember.

Argument Files

To deal with this we can use argument files.

If we make a file called project at the top level of our project with the following contents.

--module-path scripts/libs --add-modules ALL-MODULE-PATH

Now we can run the script with just java @project. This works as if all the arguments in the file were applied inline in the invocation. Most tools that come with Java, including the java launcher itself, support this.

picocli

As for identifying groups of commands, there is a solution there too. Now that we've opened the floodgates on our Java build script having dependencies, what's one more?

import dev.mccue.tools.ExitStatusException;
import dev.mccue.tools.Tool;
import picocli.CommandLine;

@CommandLine.Command(
        name = "project"
)
public final class Project {
    public static void main(String[] args) {
        new CommandLine(new Project()).execute(args);
    }

    @CommandLine.Command(name = "demo")
    public void demo() throws ExitStatusException {
        Tool.ofSubprocess("javac")
                .run("--version");
    }
}

If we use picocli, then it's trivial. Our build script is a CLI program like any other, why not use normal CLI libraries?

We get the ability to run commands by name.

$ java @project demo  
javac --version
javac 22.0.1

And we even get to list commands in a way.

$ java @project                                                               
Missing required subcommand
Usage: project [COMMAND]
Commands:
  demo

Yay!

Tool Tailored APIs

While Tool.ofSubprocess("javac").run("--version"); is complete in a sense, it's not very fun to use.

What we generally want from a Java API is method-level autocomplete. Having to separately reference a man page doesn't spark joy in me, and it shouldn't spark joy in you.

I started this particular adventure wanting to translate options from the CLI more or less 1-1. This is for two broad reasons.

  1. I think learning how to use the CLI from Java should be transferable knowledge when writing commands the old fashioned way and vice-versa.
  2. There are a lot of CLI tools and coming up with a creative name for every argument that can only be specified as -g is painful and a lot of work.

The transform I started with was for every argument that --looks-like-this I would add a method to an arguments object that looksLikeThis.

Javadoc.run(arguments -> {
    arguments
        .moduleSourcePath("./modules/*/src")
        .d(Path.of("build/javadoc"))
        .module("dev.mccue.tools")
});

This works pretty well, but look at these two options from javadoc.

javadoc --help
...
    --version     Print version information
...
    -version      Include @version paragraphs
...

It has both -version and --version and they do wildly different things. Great. Awesome.

This is a one-off example, but CLI tools are fundamentally textual apis. --some-thing to someThing isn't just a stylistic change, it's a lossy transformation.

So I gave up. My only strategy now is to take arguments that --look-like-this and turn them into ones that __look_like_this. It might be ugly, but at least I don't run into strange problems anymore. As a side benefit, it does now look a lot more 1-1 with the CLI api.

Javadoc.run(arguments -> {
    arguments
        .__module_source_path("./modules/*/src")
        ._d(Path.of("build/javadoc"))
        .__module("dev.mccue.tools")
});

Conclusion

I translated the spring demo repo I was using for some previous posts to use this approach. It includes running junit tests and managing multiple modules. You can find it here and the build script specifically here.

Note that the libraries I referenced, save for picocli, are very likely to change in backwards-incompatible ways as I iterate on them. Don't use them for anything serious yet, but you can find them here.

There are still some bootstrap and polish issues, but I think this approach is becoming more and more viable as its chipped away at.


Share thoughts, design feedback, etc. in the comments below.

1: jresolve --output-directory scripts/libs pkg:maven/dev.mccue/tools-jdk@2024.08.25.5.

2: This approach/outlook on tooling has a lot of similarities to bach and the work Christian Stein has been doing. Will likely elaborate more on the difference between this approach, bach's approach, bld's approach, etc. when I personally have more mental clarity on it.


<- Index

C Growable Arrays: In Depth

by: Ethan McCue

An extremely common question people have when using C for the first time is how to make an array of elements that can grow over time.

I know this is a common question because one older post on this website where I explained the concept (badly) gets tons of organic traffic.

It's not a bad question either. Nearly every language you might be coming at C from has an equivalent.

  • Python: list
  • JavaScript: Array
  • Java: ArrayList
  • ... etc.

And, ignoring primacy, C classes often have students make data structures for their assignments.

So I figure it might be useful to at least one person to give a walkthrough of how that data structure works in the C world.

Just keep in mind that I am not a professional C programmer. If I get anything wrong or there is something you wish I mentioned, feel free to mention it in the comments below or wherever. I'll make corrections.

Arrays

An array is a fixed size collection of data.

int numbers[] = {
    1,
    2,
    3
};

Being fixed size means that if an array starts out with enough space for 3 elements, there is no way to make space for a 4th.

C arrays are also, more or less, equivalent to pointers that just so happen to point to the start of a chunk of memory. So whenever you see something like int[] in code you can mentally translate that to int*.

Most languages' array-equivalents can have their size queried at runtime. C is a bit special in that there is no way to recover the number of elements in an array after you make it. It is just a pointer to a chunk of memory after all.

This means you have two basic options for being able to figure out the size of an array.

1. Have a sentinel terminate the array

One way to be able to figure out the size of an array is to put a special sentinel value as its last element. Code working with the array can then proceed forward until that special value is reached.

This may or may not be an option depending on the kind of data being stored in an array. The most common use of this actually comes from how C stores strings.

#include <stdio.h>

int main()
{
    char* hello1 = "Hello";
    char hello2[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
    printf("%s\n", hello1);
    printf("%s\n", hello2);

    return 0;
}

"C style strings" are an array of characters terminated by a null character. If you want to find the length of something like this, just keep looping until you get to that terminator.

#include <stdio.h>

int main()
{
    char* hello = "Hello";
    
    int i = 0;
    while (hello[i] != '\0') {
        printf("%c\n", hello[i]);
        i++;
    }
    return 0;
}

An upside to this approach is that its simple to understand. A downside is that you need to go through every element in the array to find its size, which is a pain.

2. Store it when you make the Array

If we don't want to have the null terminator we need to store a number.

One way to do this is to just manually count out how big an array is.

#include <stdio.h>

int main()
{
    int numbers[] = {
        6,
        4,
        7
    };
    
    // I counted it with my eyes
    int numbers_size = 3;
    
    int i = 0;
    while (i < numbers_size) {
        printf("%d\n", numbers[i]);
        i++;
    }
    return 0;
}

If when you initialize your array you write the number of elements directly, you can make use of sizeof to calculate the size.

#include <stdio.h>

int main()
{
    int numbers[3] = {
        6,
        4,
        7
    };
    
    // Because of the literal [3] above, C can figure out
    // how many elements there are by dividing the total
    // size of the array by the size of an individual
    // element.
    int numbers_size = sizeof(numbers) / sizeof(numbers[0]);
    
    int i = 0;
    while (i < numbers_size) {
        printf("%d\n", numbers[i]);
        i++;
    }
    return 0;
}

You then need to handle passing that size around with the array whenever you give it to a function that takes an array as an argument. A good deal of the C standard library works like this.

size_t

Small digression. When you make a variable to store an index into array or that stores the size of an array you are intended to use size_t.

If you don't it seems like its usually "fine," but I wouldn't risk the wrath of the undefined behavior demons.

To have size_t be available you should put #include <stddef.h> at the top of your program.

#include <stddef.h>
#include <stdio.h>

int main()
{
    int numbers[3] = {
        6,
        4,
        7
    };
    
    size_t numbers_size = sizeof(numbers) / sizeof(numbers[0]);
    
    size_t i = 0;
    while (i < numbers_size) {
        printf("%d\n", numbers[i]);
        i++;
    }
    return 0;
}

Heap Allocation

At runtime, you can get an arbitrarily large block of memory in various ways.

The most commonly known is malloc. You give it a size then it gives you a pointer to the start of that memory. You need #include <stdlib.h> to use it.

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h> 

int main()
{
    size_t numbers_size = 3;
    int* numbers = malloc(sizeof(int) * numbers_size);
    numbers[0] = 6;
    numbers[1] = 4;
    numbers[2] = 7;
    
    
    size_t i = 0;
    while (i < numbers_size) {
        printf("%d\n", numbers[i]);
        i++;
    }
    return 0;
}

The one we will use is calloc. It works mostly the same as its cousin malloc with two major differences.

The first is that you don't give it the full size of the array you want. You give it the size of each element and the number of elements you want seperately.

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h> 

int main()
{
    size_t numbers_size = 3;
    int* numbers = calloc(sizeof(int), numbers_size);
    numbers[0] = 6;
    numbers[1] = 4;
    numbers[2] = 7;
    
    
    size_t i = 0;
    while (i < numbers_size) {
        printf("%d\n", numbers[i]);
        i++;
    }
    return 0;
}

The second is that the memory returned is already "zeroed." This means that you know that every element is in its zero-valued state. So for int it will literally be 0, _Bools will be false, pointers will be NULL, etc.

Often that doesn't matter but, because there isn't "uninitialized" memory with random data in it, it feels more predictable.

For both approaches you need to later free that allocated memory. You will be technically exempt from needing to do this if your program doesn't run for long enough to run out of memory. I think it is best to be a "good citizen" and free your memory regardless.

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h> 

int main()
{
    size_t numbers_size = 3;
    int* numbers = calloc(sizeof(int), numbers_size);
    numbers[0] = 6;
    numbers[1] = 4;
    numbers[2] = 7;
    
    
    size_t i = 0;
    while (i < numbers_size) {
        printf("%d\n", numbers[i]);
        i++;
    }
    
    free(numbers);
    
    return 0;
}

Technically speaking malloc, calloc, etc. can fail if the system is out of memory. We are going to ignore that possibility for the rest of this, but the lower level software you write the larger chance you will need to care about very limited memory scenarios.

Single Type Growable Arrays

The basic concept of a growable array is to group three pieces of information. A pointer to an array of things, the number of elements allocated for that array, and the number of elements "actually" in the array.

struct GrowableIntArray {
    int* data;
    size_t allocated;
    size_t size;
}

So for a new array with nothing in it, the data pointer would be null and both numbers would be 0.

struct GrowableIntArray growable_int_array_empty() {
    struct GrowableIntArray empty = {
       .data = NULL,
       .allocated = 0,
       .size = 0
    };
    
    return empty;
}

To add an element to the array, you check if adding the element would make the size of the array larger than what was allocated.

If it won't, you set an element in your data array and bump the size.

If it will, you need to make a new array that is bigger than the last one. How much bigger is more art than science, but generally people find success allocating around twice as many elements as were there before.

That sounds crazy, but at worst you are only wasting half your memory. That's not that bad in the scope of things.

Then you copy over all the elements from the last array and free the old one.

void growable_int_array_add(
    struct GrowableIntArray* array, 
    int value
) {
    // If we wouldn't have enough room
    if (array->size + 1 > array->allocated) {
       // Double the size of the last array
       size_t new_allocated;
       if (array->size == 0) {
           new_allocated = 2;
       }
       else {
           new_allocated = array->size * 2;
       }
       
       // Make a new array that size
       int* new_data = calloc(sizeof(int), new_allocated);
       int* old_data = array->data;
       
       // Copy all the old elements to it
       if (old_data != NULL) {
           for (size_t i = 0; i < array->size; i++) {
               new_data[i] = old_data[i];
           }
       }
       
       // Then free the old array
       free(old_data);
       
       // And patch up the pointers 
       array->data = new_data;
       array->allocated = new_allocated;
    }
    
    // And put in the new element
    array->data[array->size] = value;
    array->size++;
}

Which is a chunky function, but now you should be good to go on making something which you can use as an array but which dynamically grows as elements are added.

int main()
{
    struct GrowableIntArray numbers 
        = growable_int_array_empty();
        
    growable_int_array_add(&numbers, 6);
    growable_int_array_add(&numbers, 4);
    growable_int_array_add(&numbers, 7);

    
    size_t i = 0;
    while (i < numbers.size) {
        printf("%d\n", numbers.data[i]);
        i++;
    }
    return 0;
}

From there it's all a matter of personal taste. Many would want to implement their own growable_int_array_size and growable_int_array_get. Both of these are relatively straight forward and useful if your goal is to avoid accessing struct members directly

size_t growable_int_array_size(
    struct GrowableIntArray* array
) {
    return array->size;
}


int growable_int_array_get(
    struct GrowableIntArray* array, 
    size_t i
) {
    // You can do precondition checks and crash early if someone
    // tries to out of bounds if you want.
    return array->data[i];
}
int main()
{
    struct GrowableIntArray numbers 
        = growable_int_array_empty();
        
    growable_int_array_add(&numbers, 6);
    growable_int_array_add(&numbers, 4);
    growable_int_array_add(&numbers, 7);

    
    size_t i = 0;
    while (i < growable_int_array_size(&numbers)) {
        printf("%d\n", growable_int_array_get(&numbers, i));
        i++;
    }
    return 0;
}

But all of this has a major flaw. Do you see it?

It only works with ints! If you want to have a growable array of longs or Positions or whatever, you need to copy and paste all of this code, change the types around, and make brand-new functions.

What we want is the ability to write code for a growable array once and then have that work for any kind of data we want to store. That gives leaves us with two options.

  1. Make a growable array that can be used for anything at runtime
  2. Make a growable array that can be specialized for anything at compile-time.

Runtime Generic Growable Arrays

What do an int, a char and a struct Position have in common? Nothing. Save some really strange layout choices by a compiler, all of these data types require different amounts of memory.

What do an int*, a char*, and a struct Position* have in common? Turns out all of them can be safely converted to and from a void*.

#include <stdio.h>

int main()
{
    int eight = 8;
    
    int* eightPointer = &eight;
    void* voidPointer = (void*) eightPointer;
    eightPointer = (int*) voidPointer;
    
    printf("%d\n", *eightPointer);

    return 0;
}

A void* is a pointer to "something." The C compiler forgets what kind of information is actually stored in it. All pointers in C have the same size, so now we have our way of storing anything.

struct GrowableArray {
    void** data;
    size_t allocated;
    size_t size;
}

At first, it might seem like we can just do that and find+replace int with void* in the code from before. And you'd be right. Just be aware that things which were once int* will become void**. A pointer to an array of void pointers.

void growable_array_add(
    struct GrowableArray* array, 
    void* value
) {
    // If we wouldn't have enough room
    if (array->size + 1 > array->allocated) {
       // Double the size of the last array
       size_t new_allocated;
       if (array->size == 0) {
           new_allocated = 2;
       }
       else {
           new_allocated = array->size * 2;
       }
       
       // Make a new array that size
       void** new_data = (void**) calloc(
            sizeof(void*), 
            new_allocated
       );
       void** old_data = array->data;
       
       // Copy all the old elements to it
       if (old_data != NULL) {
           for (size_t i = 0; i < array->size; i++) {
               new_data[i] = old_data[i];
           }
       }
       
       // Then free the old array
       free(old_data);
       
       // And patch up the pointers 
       array->data = new_data;
       array->allocated = new_allocated;
    }
    
    // And put in the new element
    array->data[array->size] = value;
    array->size++;
}

Usability

The first problems that will arise are around usability.

To pass a pointer in, it can't be an rvalue. An rvalue is something that should go on the right hand size of an equals sign. That's where the r comes from.

This means that you can't just directly pass in a pointer to an int.

growable_array_add(&numbers, &6);

&6 doesn't have a meaning to C. You need to have constant values first assigned to a variable.

int n = 6;
growable_array_add(&numbers, &n);

This can be annoying to write out, but you might get used to it. Even harder to come to terms with is needing to recover the type of a pointer whenever you get it out.

You need to both convert the void* to an int* or whatever actual type you stored and, if it's something like int, dereference that pointer to get at the actual value.

int value = *((int*) growable_array_get(&numbers, i));

The C compiler doesn't take kindly to mishandled void*s. If you get this wrong you get teleported to Florida.

int main()
{
    struct GrowableArray numbers 
        = growable_array_empty();
      
    int a = 6;
    int b = 4;
    int c = 7;
    
    growable_array_add(&numbers, &a);
    growable_array_add(&numbers, &b);
    growable_array_add(&numbers, &c);

    
    size_t i = 0;
    while (i < growable_array_size(&numbers)) {
        printf("%d\n", *((int*) growable_array_get(&numbers, i)));
        i++;
    }
    return 0;
}

Pointer Lifetimes

Pointers don't all "live" the same amount of time. You can take a pointer to a local variable, but that pointer is only valid so long as you are still within that function.

int* example() {
   int x = 5;
   int* xPointer = &x;
   
   // Can use xPointer freely
   
   // But if you return the pointer out it won't
   // be valid
   return xPointer;
}

You can make pointers that live longer with calloc, but you later need to call free on them.

int* example() {
   int* xPointer = calloc(sizeof(int), 1);
   *xPointer = 5;
   
   // Valid to return, but something eventually
   // should free it.
   return xPointer;
}

This presents a problem for our array of void*s. If all the pointers are pointing to local variables on the stack then your cleanup should just be to call free on array.data.

void growable_array_cleanup(struct GrowableArray array) {
    free(array.data);
}

But if the pointers are pointing to heap allocated memory then someone needs to clean them up later.

void growable_array_cleanup(struct GrowableArray* array) {
    for (size_t i = 0; i < array->size; i++) {
        free(array->data[i]);
    }
    free(array->data);
}

Even worse than that, some things don't just need to be free-ed. They might have been allocated outside the calloc/free system or, like our GrowableArray, they might have some custom cleanup process.

To deal with the sheer variety of situations we need to store how we want to clean up elements in the array itself.

struct GrowableArray {
    void** data;
    size_t allocated;
    size_t size;
    void (*cleanup)(void*);
}

This syntax - void (*cleanup)(void*) - is how you declare a pointer to a function. In this case a function whose return type is void and whose sole argument is a void*. If it looks confusing to you don't worry. It confuses me too.

struct GrowableIntArray growable_array_empty(
    void (*cleanup)(void*)
) {
    return growable_array_empty(NULL);
}

struct GrowableArray growable_array_empty(
    void (*cleanup)(void*)
) {
    struct GrowableArray empty = {
       .data = NULL,
       .allocated = 0,
       .size = 0
       .cleanup = cleanup
    };
    
    return empty;
}

Once you have the cleanup function stored you can make a general cleanup function for the growable array itself.

void growable_array_cleanup(struct GrowableArray* array) {
    if (array->cleanup != NULL) {
        for (size_t i = 0; i < array->size; i++) {
            array->cleanup(array->data[i]);
        }
    }

    free(array->data);
}

If their data is on the stack, you skip trying to free it. If it needs to be free-ed that can be done same as if you need to call special_framework_destroy.

While this might seem like we've solved the problem, notice that no matter what we need to now track when to call a special growable_array_cleanup, each array has an extra pointer of memory, and has to check if array->cleanup != NULL at close. Everything comes at a cost.

Memory Locality

Following the same theme, this sort of structure is forced to have subpar memory locality.

If you were to make an int array, the memory would be laid out like this with each int being directly next to the others.

-------------
| 5 | 4 | 3 |
-------------

When we make an array of int pointers the memory layout looks like this.

-------------------
| ptr | ptr | ptr |
---|-----|-----|---
   V     |     |
   5     V     |
         4     V
               3

Modern CPUs love going through arrays in order. They hate following pointers. This memory layout is almost guaranteed to lead to subpar performance compared to the tightly packed array.

Notice also that we didn't choose this memory layout because we wanted to. We chose it because we didn't want to write out the data structure more than once.

Compile-Time Generic Growable Arrays

If we don't want everything behind a void* we need a perfect vinaigrette of clever and stupid.

Template Headers

The only things that need to change between a growable int array and a growable struct Position array are the struct names, function names, return types, and arguments.

struct GrowableIntArray {
    int* data;
    size_t allocated;
    size_t size;
};

struct GrowablePositionArray {
    struct Position* data;
    size_t allocated;
    size_t size;
}

But we don't need to do that by hand. We have the C preprocessor.

#define GROWABLE_ARRAY_STRUCT struct GrowableIntArray
#define GROWABLE_ARRAY_DATA_POINTER int*

GROWABLE_ARRAY_STRUCT {
    GROWABLE_ARRAY_DATA_POINTER data;
    size_t allocated;
    size_t size;
};

If you've made a C header file before you've probably seen a prelude like this.

#ifndef SOME_FILE_H
#define SOME_FILE_H

// ... CODE FOR HEADER HERE ...

#endif

The purpose of this is so that if more than one file includes the header the code for it only shows up once.

Here we don't want to do that. We want it to be able to be included multiple times in one compilation.

In our growable_array.h we want to assume that GROWABLE_ARRAY_STRUCT, GROWABLE_ARRAY_DATA_POINTER, and whatever else we need defined are already defined by whatever code is including the header. #ifndef and #error can give some basic guardrails for that.

#ifndef GROWABLE_ARRAY_STRUCT
#error "GROWABLE_ARRAY_STRUCT not defined"
#endif

#ifndef GROWABLE_ARRAY_DATA_POINTER
#error "GROWABLE_ARRAY_DATA_POINTER not defined"
#endif

GROWABLE_ARRAY_STRUCT {
    GROWABLE_ARRAY_DATA_POINTER data;
    size_t allocated;
    size_t size;
};

Then we make one header file for each "specialization" we want of the growable array. So for ints we would make growable_int_array.h and put something like the following in it.

#ifndef GROWABLE_INT_ARRAY_H
#define GROWABLE_INT_ARRAY_H

#define GROWABLE_ARRAY_STRUCT struct GrowableIntArray
#define GROWABLE_ARRAY_DATA_POINTER int*

#include "growable_array.h"

#undef GROWABLE_ARRAY_STRUCT
#undef GROWABLE_ARRAY_DATA_POINTER

#endif

First, the normal header prelude. We don't want the growable int array to be defined more than once. Then we define the variables needed for our template header, include that header, and #undef those variables afterward.

The reason we bother with #undef is the same reason this works in the first place. The C preprocessor just does text replacements. When we include growable_array.h it literally spits the contents of that file in place. If we don't #undef a variable we defined it can lead to some head-scratchers compiling some other file.

But now all other code needs to do is include growable_int_array.h to get a growable array for ints. All we need to do to get a growable array for a specific type is do some #defines. Rinse and repeat for any other kind of growable array we want.

Pointer Lifetimes

Using void* might have forced us to always handle the lifetimes of those pointers and using int or whatever else without indirection lets us skip over memory management.

Unfortunately memory management is a fact of life in C. If the kind of thing we are storing needs to be cleaned up we need to track that.

#include <stddef.h>
#include <stdlib.h> 

#ifndef GROWABLE_ARRAY_STRUCT
#error "GROWABLE_ARRAY_STRUCT not defined"
#endif

#ifndef GROWABLE_ARRAY_DATA_POINTER
#error "GROWABLE_ARRAY_DATA_POINTER not defined"
#endif

#ifndef GROWABLE_ARRAY_STRUCT_POINTER
#error "GROWABLE_ARRAY_STRUCT_POINTER not defined"
#endif

#ifndef GROWABLE_ARRAY_CLEANUP_FUNCTION_NAME
#error "GROWABLE_ARRAY_CLEANUP_FUNCTION_NAME not defined"
#endif

GROWABLE_ARRAY_STRUCT {
    GROWABLE_ARRAY_DATA_POINTER data;
    size_t allocated;
    size_t size;
};

void GROWABLE_ARRAY_CLEANUP_FUNCTION_NAME(
    GROWABLE_ARRAY_STRUCT_POINTER array
) {
    #ifdef GROWABLE_ARRAY_ITEM_CLEANUP_FUNCTION_NAME
        for (size_t i = 0; i < array->size; i++) {
            GROWABLE_ARRAY_ITEM_CLEANUP_FUNCTION_NAME(array->data[i]);
        }
    #endif

    free(array->data);
}

The good news is that with the template approach that tracking doesn't need to happen at runtime. The bad news is that it needs to happen in the C preprocessor.

Implementor Experience

You might notice that there wouldn't be a warning if GROWABLE_ARRAY_ITEM_CLEANUP_FUNCTION_NAME was not defined. There also is a dearth of good names to give these things. It's understandable to get tripped up by the difference between GROWABLE_ARRAY_ITEM_CLEANUP_FUNCTION_NAME and GROWABLE_ARRAY_CLEANUP_FUNCTION_NAME.

Best case scenario if you fill in one of these preprocessor defines wrongly is that your code doesn't compile. Worst case is that you get some insane and hard to debug behavior.

There will also end up being more than a few #defines you need to make. I'm not making use of concatenation for clarity, but even that doesn't trim the number down that far. If we write out some of the other functions you will see how this can be burdensome.

#ifndef GROWABLE_ARRAY_STRUCT
#error "GROWABLE_ARRAY_STRUCT not defined"
#endif

#ifndef GROWABLE_ARRAY_STRUCT_POINTER
#error "GROWABLE_ARRAY_STRUCT_POINTER not defined"
#endif

#ifndef GROWABLE_ARRAY_DATA
#error "GROWABLE_ARRAY_DATA not defined"
#endif

#ifndef GROWABLE_ARRAY_DATA_POINTER
#error "GROWABLE_ARRAY_DATA_POINTER not defined"
#endif

#ifndef GROWABLE_ARRAY_EMPTY_FUNCTION_NAME
#error "GROWABLE_ARRAY_EMPTY_FUNCTION_NAME not defined"
#endif

#ifndef GROWABLE_ARRAY_CLEANUP_FUNCTION_NAME
#error "GROWABLE_ARRAY_CLEANUP_FUNCTION_NAME not defined"
#endif

#ifndef GROWABLE_ARRAY_ADD_FUNCTION_NAME
#error "GROWABLE_ARRAY_ADD_FUNCTION_NAME not defined"
#endif

GROWABLE_ARRAY_STRUCT {
    GROWABLE_ARRAY_DATA_POINTER data;
    size_t allocated;
    size_t size;
};

GROWABLE_ARRAY_STRUCT GROWABLE_ARRAY_EMPTY_FUNCTION_NAME() {
    GROWABLE_ARRAY_STRUCT empty = {
       .data = NULL,
       .allocated = 0,
       .size = 0
    };
    
    return empty;
}

void GROWABLE_ARRAY_CLEANUP_FUNCTION_NAME(
    GROWABLE_ARRAY_STRUCT_POINTER array
) {
    #ifdef GROWABLE_ARRAY_ITEM_CLEANUP_FUNCTION_NAME
        for (size_t i = 0; i < array->size; i++) {
            GROWABLE_ARRAY_ITEM_CLEANUP_FUNCTION_NAME(array->data[i]);
        }
    #endif

    free(array->data);
}

void GROWABLE_ARRAY_ADD_FUNCTION_NAME(
    GROWABLE_ARRAY_STRUCT_POINTER array, 
    GROWABLE_ARRAY_DATA value
) {
    if (array->size + 1 > array->allocated) {
       size_t new_allocated;
       if (array->size == 0) {
           new_allocated = 2;
       }
       else {
           new_allocated = array->size * 2;
       }
       
       GROWABLE_ARRAY_DATA_POINTER new_data 
            = (GROWABLE_ARRAY_DATA_POINTER) calloc(sizeof(void*), new_allocated);
       GROWABLE_ARRAY_DATA_POINTER old_data = array->data;
       
       // Copy all the old elements to it
       if (old_data != NULL) {
           for (size_t i = 0; i < array->size; i++) {
               new_data[i] = old_data[i];
           }
       }
       
       free(old_data);

       array->data = new_data;
       array->allocated = new_allocated;
    }
    
    array->data[array->size] = value;
    array->size++;
}

All of which still needs to be handled in each specialization.

#ifndef GROWABLE_INT_ARRAY_H
#define GROWABLE_INT_ARRAY_H

#define GROWABLE_ARRAY_STRUCT struct GrowableIntArray
#define GROWABLE_ARRAY_STRUCT_POINTER struct GrowableIntArray*
#define GROWABLE_ARRAY_DATA int
#define GROWABLE_ARRAY_DATA_POINTER int*
#define GROWABLE_ARRAY_EMPTY_FUNCTION_NAME growable_int_array_empty
#define GROWABLE_ARRAY_ADD_FUNCTION_NAME growable_int_array_add
#define GROWABLE_ARRAY_CLEANUP_FUNCTION_NAME growable_int_array_cleanup

#include "growable_array.h"

#undef GROWABLE_ARRAY_STRUCT
#undef GROWABLE_ARRAY_DATA_POINTER
#undef GROWABLE_ARRAY_DATA
#undef GROWABLE_ARRAY_DATA_POINTER
#undef GROWABLE_ARRAY_EMPTY_FUNCTION_NAME
#undef GROWABLE_ARRAY_ADD_FUNCTION_NAME
#undef GROWABLE_ARRAY_CLEANUP_FUNCTION_NAME

#endif

While this is all technically less work than making all the logic for a growable array from scratch ten times, It's certainly not pretty.

If you've ever had the life lesson of working with C++ templates, this is the sort of thing that language feature is intended to replace.

template <typename T>
struct GrowableArray {
   T* data;
   size_t allocated;
   size_t size;  
}

If you haven't, don't get too excited. There lie demons also.

Conclusion

And that is basically it.

To grow an array you allocate a new array and copy data into it. To be efficient you allocate more memory than you need each time you grow.

If you want to make that data structure for more than one specific data type you either need to rely on runtime indirection and pointers or you need to dive into the C preprocessor and make template headers.

If you are a student who has a question you can ask below. You can find complete examples of all these approaches in this GitHub repo.

Corrections welcome.

Corrections

realloc

Instead of calloc for everything it is more efficient to use realloc. When malloc and co. give you a chunk of memory that memory might secretly be larger than you requested. If it is, you can avoid having to do much of the work of the allocator.

Efficient Runtime Generic Growable Arrays

One thing that was pointed out to me is that using a void** for the runtime is a naive strategy. We can avoid the memory indirection implied by having an array of pointers by storing the byte size of each element in the struct.

struct GrowableArray {
    void* data;
    size_t allocated;
    size_t size;
    void (*cleanup)(void*);
    size_t element_size;
}

Then when we allocate data we get void* instead of a void** for our storage. Functions like growable_array_get will still have to return a void* as a result, but those can be cast dereferenced. What is important is that the data behind the void* will have the ideal memory layout.

calloc behavior

Small but important in some contexts point: calloc doesn't give you "the zero" for every type. It does fill the memory with 0 bytes, but I have been informed that for floats and similar all zero bytes might not be a zero value.


<- Index

Just use Postgres

by: Ethan McCue

This is one part actionable advice, one part question for the audience.

Advice: When you are making a new application that requires persistent storage of data, like is the case for most web applications, your default choice should be Postgres.

Why not sqlite?

sqlite is a pretty good database, but its data is stored in a single file.

This implies that whatever your application is, it is running on one machine and one machine only. Or at least one shared filesystem.

If you are making a desktop or mobile app, that's perfect. If you are making a website it might not be.

There are many success stories of using sqlite for a website, but they mostly involve people who set up their own servers and infrastructure. Platforms as a service-s like Heroku, Railway, Render, etc. generally expect you to use a database accessed over network boundary. It's not wrong to give up some of the benefits of those platforms, but do consider if the benefits of sqlite are worth giving up platform provided automatic database backups and the ability to provision more than one application server.

The official documentation has a good guide with some more specifics.

Why not DynamoDB, Cassandra, or MongoDB?

Wherever Rick Houlihan is, I hope he is having a good day.

I watch a lot of conference talks, but his 2018 DynamoDB Deep Dive might be the one I've watched the most. I know very few of you are going to watch an hour-long talk, but you really should. It's a good one.

The thrust of it is that databases that are in the same genre as DynamoDB - which includes Cassandra and MongoDB - are fantastic if - and this is a load bearing if:

  • You know exactly what your app needs to do, up-front
  • You know exactly what your access patterns will be, up-front
  • You have a known need to scale to really large sizes of data
  • You are okay giving up some level of consistency

This is because this sort of database is basically a giant distributed hash map. The only operations that work without needing to scan the entire database are lookups by partition key and scans that make use of a sort key.

Whatever queries you need to make, you need to encode that knowledge in one of those indexes before you store it. You want to store users and look them up by either first name or last name? Well you best have a sort key that looks like <FIRST NAME>$<LAST NAME>. Your access patterns should be baked into how you store your data. If your access patterns change significantly, you might need to reprocess all of your data.

It's annoying because, especially with MongoDB, people come into it having been sold on it being a more "flexible" database. Yes, you don't need to give it a schema. Yes, you can just dump untyped JSON into collections. No, this is not a flexible kind of database. It is an efficient one.

With a relational database you can go from getting all the pets of a person to getting all the owners of a pet by slapping an index or two on your tables. With this genre of NoSQL, that can be a tall order.

Its also not amazing if you need to run analytics queries. Arbitrary questions like "How many users signed up in the last month" can be trivially answered by writing a SQL query, perhaps on a read-replica if you are worried about running an expensive query on the same machine that is dealing with customer traffic. It's just outside the scope of this kind of database. You need to be ETL-ing your data out to handle it.

If you see a college student or fresh grad using MongoDB stop them. They need help. They have been led astray.

Why not Valkey?

The artist formerly known as Redis is best known for being an efficient out-of-process cache. You compute something expensive once and slap it in Valkey so all 5 or so HTTP servers you have don't need to recompute it.

However, you can use it as your primary database. It stores all its data in RAM, so it's pretty fast if you do that.

Obvious problems:

  • You can only have so much RAM. You can have a lot more than you'd think, but its still pretty limited compared to hard drives.
  • Same as the DynamoDB-likes, you need to make concessions on how you model your data.

Why not Datomic?

If you already knew about this one, you get a gold star.

Datomic is a NoSQL database, but it is a relational one. The "up-front design" problems aren't there, and it does have some neat properties.

You don't store data in tables. It's all "entity-attribute-value-time" (EAVT) pairs. Instead of a person row with id, name, and age you store 1 :person/name "Beth" and 1 :person/age 30. Then your queries work off of "universal" indexes.

You don't need to coordinate with writers when making queries. You query the database "as-of" a given time. New data, even deletions (or as they call them "retractions"), don't actually delete old data.

But there are some significant problems

  • It only works with JVM languages.
  • Outside of Clojure, a relatively niche language, its API sucks.
  • If you structure a query badly the error messages you get are terrible.
  • The whole universe of tools that exist for SQL just aren't there.

Why not XTDB?

Clojure people make a lot of databases.

XTDB is spiritually similar do Datomic but:

  • There is an HTTP api, so you aren't locked to the JVM.
  • It has two axes of time you can query against. "System Time" - when records were inserted - and "Valid Time."
  • It has a SQL API.

The biggest points against it are:

  • It's new. Its SQL API is something that popped up in the last year. It recently changed its whole storage model. Will the company behind it survive the next 10 years? Who knows!

Okay that's just one point. I'm sure I could think of more, but treat this as a stand-in for any recently developed database. The best predictor something will continue to exist into the future is how long it has existed. COBOL been around for decades, it will likely continue to exist for decades.

If you have persistent storage, you want as long a support term as you can get. You can certainly choose to pick a newer or experimental database for your app but, regardless of technical properties, that's a risky choice. It shouldn't be your default.

Why not Kafka?

Kafka is an append only log. It can handle TBs of data. It is a very good append only log. It works amazingly well if you want to do event sourcing type stuff with data flowing in from multiple services maintained by multiple teams of humans.

But:

  • Up to a certain scale, a table in Postgres works perfectly fine as an append only log.
  • You likely do not have hundreds of people working on your product nor TBs of events flowing in.
  • Making a Kafka consumer is a bit more error-prone than you'd expect. You need to keep track of your place in the log after all.
  • Even when maintained by a cloud provider (and there are good managed Kafka services) its another piece of infrastructure you need to monitor.

Why not ElasticSearch?

Is searching over data the primary function of your product?

If yes, ElasticSearch is going to give you some real pros. You will need to ETL your data into it and manage that whole process, but ElasticSearch is built for searching. It does searching good.

If no, Postgres will be fine. A sprinkling of ilike and the built-in full text search is more than enough for most applications. You can always bolt on a dedicated search thing later.

Why not MSSQL or Oracle DB?

Genuine question you should ask yourself: Are these worth the price tag?

I don't just mean the straight-up cost to license, but also the cost of lock-in. Once your data is in Oracle DB you are going to be paying Oracle forever. You are going to have to train your coders on its idiosyncrasies, forever. You are going to have to decide between enterprise features and your wallet, forever.

I know its super unlikely that you will contribute a patch to Postgres, so I won't pretend that there is some magic "power of open source" going on, but I think you should have a very specific need in mind to choose a proprietary DB. If you don't have some killer MSSQL feature that you simply cannot live without, don't use it.

Why not MySQL?

This is the one that I need some audience help with.

MySQL is owned by Oracle. There are features locked behind their enterprise editions. To an extent you will have lock-in issues the same as any other DB.

But the free edition MySQL has also been used in an extremely wide range of things. It's been around for a long time. There are people who know how to work with it.

My problem is that I've only spent ~6 months of my professional career working with it. I genuinely don't know enough to compare it intelligently to Postgres.

I'm convinced it isn't secretly so much better that I am doing folks a disservice when telling them to use Postgres, and I do remember reading about how Postgres generally has better support for enforcing invariants in the DB itself, but I wouldn't mind being schooled a bit here.

Why not some AI vector DB?

  • Most are new. Remember the risks of using something new.
  • AI is a bubble. A load-bearing bubble, but a bubble. Don't build a house on it if you can avoid it.
  • Even if your business is another AI grift, you probably only need to import openai.

Why not Google Sheets?

You're right. I can't think of any downsides. Go for it.


<- Index

I Can't Run My Rust Game Either

by: Ethan McCue

Yesterday I talked about an issue I had updating a Rust project. The time between when that project was working for me and when it was not was only a few months.

But I have one other Rust project I haven't touched in a while. This little game.

In it, you play as a Pong paddle catching or dodging bullets from an alien. It was a fun project at the time and a good exercise with Rust.

It also does not compile today.

   Compiling winit v0.19.5
error[E0308]: mismatched types
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/view.rs:209:9
    |
205 | extern fn has_marked_text(this: &Object, _sel: Sel) -> BOOL {
    |                                                        ---- expected `bool` because of return type
...
209 |         (marked_text.length() > 0) as i8
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `bool`, found `i8`

error[E0308]: mismatched types
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:103:26
    |
103 |             is_zoomed != 0
    |             ---------    ^ expected `bool`, found integer
    |             |
    |             expected because this is `bool`

error[E0308]: mismatched types
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:175:57
    |
175 |                 self.window.setFrame_display_(new_rect, 0);
    |                             -----------------           ^ expected `bool`, found integer
    |                             |
    |                             arguments to this method are incorrect
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
    |
932 |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
    |               ^^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:1301:48
     |
1301 |         window.setFrame_display_(current_rect, 0)
     |                -----------------               ^ expected `bool`, found integer
     |                |
     |                arguments to this method are incorrect
     |
note: method defined here
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
     |
932  |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
     |               ^^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:1308:48
     |
1308 |         window.setFrame_display_(current_rect, 0)
     |                -----------------               ^ expected `bool`, found integer
     |                |
     |                arguments to this method are incorrect
     |
note: method defined here
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
     |
932  |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
     |               ^^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:1325:48
     |
1325 |         window.setFrame_display_(current_rect, 0)
     |                -----------------               ^ expected `bool`, found integer
     |                |
     |                arguments to this method are incorrect
     |
note: method defined here
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
     |
932  |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
     |               ^^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:1332:48
     |
1332 |         window.setFrame_display_(current_rect, 0)
     |                -----------------               ^ expected `bool`, found integer
     |                |
     |                arguments to this method are incorrect
     |
note: method defined here
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
     |
932  |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
     |               ^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `winit` (lib) due to 7 previous errors

A lot of people suggested locking to an older version of the Rust toolchain. So I tried to set my laptop to have whatever the Rust compiler was the last time I made a commit on that project.

  alien_game git:(master)  rustup toolchain install stable-2020-03-25
info: syncing channel updates for 'stable-2020-03-25-aarch64-apple-darwin'
error: no release found for 'stable-2020-03-25'

Oh, yeah. ARM Macs weren't a thing in 2020.

I no longer have the laptop I used to write this code.

I can't find a list of available toolchains online and trying every date is tiresome - maybe I'll script it if all else fails - but there are only two dependencies.

[dependencies]
ggez = "0.5"
rand = "0.7.3"

What if I just lock newer versions of winit and cocoa?

[dependencies]
ggez = "0.5"
rand = "0.7.3"
winit = "0.30.4"
cocoa = "0.25.0"
   Compiling winit v0.19.5
error[E0308]: mismatched types
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/view.rs:209:9
    |
205 | extern fn has_marked_text(this: &Object, _sel: Sel) -> BOOL {
    |                                                        ---- expected `bool` because of return type
...
209 |         (marked_text.length() > 0) as i8
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `bool`, found `i8`

error[E0308]: mismatched types
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:103:26
    |
103 |             is_zoomed != 0
    |             ---------    ^ expected `bool`, found integer
    |             |
    |             expected because this is `bool`

error[E0308]: mismatched types
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:175:57
    |
175 |                 self.window.setFrame_display_(new_rect, 0);
    |                             -----------------           ^ expected `bool`, found integer
    |                             |
    |                             arguments to this method are incorrect
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
    |
932 |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
    |               ^^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:1301:48
     |
1301 |         window.setFrame_display_(current_rect, 0)
     |                -----------------               ^ expected `bool`, found integer
     |                |
     |                arguments to this method are incorrect
     |
note: method defined here
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
     |
932  |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
     |               ^^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:1308:48
     |
1308 |         window.setFrame_display_(current_rect, 0)
     |                -----------------               ^ expected `bool`, found integer
     |                |
     |                arguments to this method are incorrect
     |
note: method defined here
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
     |
932  |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
     |               ^^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:1325:48
     |
1325 |         window.setFrame_display_(current_rect, 0)
     |                -----------------               ^ expected `bool`, found integer
     |                |
     |                arguments to this method are incorrect
     |
note: method defined here
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
     |
932  |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
     |               ^^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/winit-0.19.5/src/platform/macos/window.rs:1332:48
     |
1332 |         window.setFrame_display_(current_rect, 0)
     |                -----------------               ^ expected `bool`, found integer
     |                |
     |                arguments to this method are incorrect
     |
note: method defined here
    --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cocoa-0.18.5/src/appkit.rs:932:15
     |
932  |     unsafe fn setFrame_display_(self, windowFrame: NSRect, display: BOOL);
     |               ^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `winit` (lib) due to 7 previous errors

I guess hell or high water it is bringing in that version of winit.

Well, what if I upgraded ggez to the latest?

[dependencies]
ggez = "0.9.3"
rand = "0.7.3"
   Compiling rustisbetter v0.1.0 (/Users/emccue/Development/alien_game)
error[E0432]: unresolved import `ggez::event::quit`
 --> src/main.rs:2:25
  |
2 | use ggez::event::{self, quit, EventHandler, KeyCode, KeyMods};
  |                         ^^^^ no `quit` in `event`

error[E0432]: unresolved import `ggez::graphics::Font`
 --> src/main.rs:4:5
  |
4 | use ggez::graphics::Font;
  |     ^^^^^^^^^^^^^^^^^^^^ no `Font` in `graphics`

error[E0432]: unresolved import `ggez::nalgebra`
 --> src/main.rs:6:11
  |
6 | use ggez::nalgebra::Point2;
  |           ^^^^^^^^ could not find `nalgebra` in `ggez`

error[E0432]: unresolved import `ggez::nalgebra`
 --> src/alien.rs:3:11
  |
3 | use ggez::nalgebra::Point2;
  |           ^^^^^^^^ could not find `nalgebra` in `ggez`

error[E0432]: unresolved import `ggez::nalgebra`
 --> src/bullet.rs:3:11
  |
3 | use ggez::nalgebra::Point2;
  |           ^^^^^^^^ could not find `nalgebra` in `ggez`

error[E0425]: cannot find function `screen_coordinates` in module `graphics`
   --> src/main.rs:177:44
    |
177 |         let screen_coordinates = graphics::screen_coordinates(&ctx);
    |                                            ^^^^^^^^^^^^^^^^^^ not found in `graphics`

error[E0425]: cannot find function `clear` in module `graphics`
   --> src/main.rs:318:19
    |
318 |         graphics::clear(ctx, graphics::WHITE);
    |                   ^^^^^ not found in `graphics`

error[E0425]: cannot find value `WHITE` in module `graphics`
   --> src/main.rs:318:40
    |
318 |         graphics::clear(ctx, graphics::WHITE);
    |                                        ^^^^^ not found in `graphics`

error[E0425]: cannot find function `present` in module `graphics`
   --> src/main.rs:324:19
    |
324 |         graphics::present(ctx)?;
    |                   ^^^^^^^ not found in `graphics`

error[E0603]: enum `KeyCode` is private
  --> src/main.rs:2:45
   |
2  | use ggez::event::{self, quit, EventHandler, KeyCode, KeyMods};
   |                                             ^^^^^^^ private enum
   |
note: the enum `KeyCode` is defined here
  --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/event.rs:34:30
   |
34 | use crate::input::keyboard::{KeyCode, KeyInput, KeyMods};
   |                              ^^^^^^^
help: import `KeyCode` directly
   |
2  | use ggez::event::{self, quit, EventHandler, winit::event::VirtualKeyCode, KeyMods};
   |                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0603]: struct `KeyMods` is private
  --> src/main.rs:2:54
   |
2  | use ggez::event::{self, quit, EventHandler, KeyCode, KeyMods};
   |                                                      ^^^^^^^ private struct
   |
note: the struct `KeyMods` is defined here
  --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/event.rs:34:49
   |
34 | use crate::input::keyboard::{KeyCode, KeyInput, KeyMods};
   |                                                 ^^^^^^^
help: import `KeyMods` directly
   |
2  | use ggez::event::{self, quit, EventHandler, KeyCode, ggez::input::keyboard::KeyMods};
   |                                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

warning: unused imports: `BlendMode`, `Rect`
 --> src/main.rs:5:22
  |
5 | use ggez::graphics::{BlendMode, DrawParam, Drawable, Rect, Text};
  |                      ^^^^^^^^^                       ^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused import: `GameError`
 --> src/main.rs:7:19
  |
7 | use ggez::{audio, GameError};
  |                   ^^^^^^^^^

warning: unused import: `rand::seq::SliceRandom`
 --> src/main.rs:9:5
  |
9 | use rand::seq::SliceRandom;
  |     ^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: `std::error::Error`
  --> src/main.rs:12:5
   |
12 | use std::error::Error;
   |     ^^^^^^^^^^^^^^^^^

warning: unused import: `std::iter::Peekable`
  --> src/main.rs:16:5
   |
16 | use std::iter::Peekable;
   |     ^^^^^^^^^^^^^^^^^^^

warning: unused import: `std::iter::Peekable`
 --> src/alien.rs:9:5
  |
9 | use std::iter::Peekable;
  |     ^^^^^^^^^^^^^^^^^^^

warning: unused import: `std::ops::Add`
  --> src/alien.rs:10:5
   |
10 | use std::ops::Add;
   |     ^^^^^^^^^^^^^

warning: unnecessary parentheses around pattern
  --> src/alien.rs:80:13
   |
80 |         let ((min_x, max_x)) = self.x_movement_range;
   |             ^              ^
   |
   = note: `#[warn(unused_parens)]` on by default
help: remove these parentheses
   |
80 -         let ((min_x, max_x)) = self.x_movement_range;
80 +         let (min_x, max_x) = self.x_movement_range;
   |

warning: use of deprecated function `ggez::filesystem::open`: Use `ctx.fs.open` instead
   --> src/main.rs:142:65
    |
142 |         let data = audio::SoundData::from_read(&mut filesystem::open(ctx, "/Bloop.mp3")?)?;
    |                                                                 ^^^^
    |
    = note: `#[warn(deprecated)]` on by default

error[E0050]: method `key_down_event` has 5 parameters but the declaration in trait `key_down_event` has 4
   --> src/main.rs:329:9
    |
329 | /         &mut self,
330 | |         ctx: &mut Context,
331 | |         keycode: KeyCode,
332 | |         _keymods: KeyMods,
333 | |         _repeat: bool,
    | |_____________________^ expected 4 parameters, found 5
    |
    = note: `key_down_event` from trait: `fn(&mut Self, &mut ggez::Context, KeyInput, bool) -> Result<(), E>`

error[E0050]: method `key_up_event` has 4 parameters but the declaration in trait `key_up_event` has 3
   --> src/main.rs:348:21
    |
348 |     fn key_up_event(&mut self, _ctx: &mut Context, keycode: KeyCode, _keymods: KeyMods) {
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected 3 parameters, found 4
    |
    = note: `key_up_event` from trait: `fn(&mut Self, &mut ggez::Context, KeyInput) -> Result<(), E>`

error[E0053]: method `resize_event` has an incompatible type for trait
   --> src/main.rs:358:76
    |
358 |     fn resize_event(&mut self, _ctx: &mut Context, width: f32, height: f32) {
    |                                                                            ^ expected `Result<(), GameError>`, found `()`
    |
    = note: expected signature `fn(&mut Game, &mut ggez::Context, _, _) -> Result<(), GameError>`
               found signature `fn(&mut Game, &mut ggez::Context, _, _)`
help: change the output type to match the trait
    |
358 |     fn resize_event(&mut self, _ctx: &mut Context, width: f32, height: f32) -> Result<(), GameError> {
    |                                                                             ++++++++++++++++++++++++

error[E0308]: mismatched types
   --> src/alien.rs:141:13
    |
140 |         sprite.draw(
    |                ---- arguments to this method are incorrect
141 |             ctx,
    |             ^^^ expected `&mut Canvas`, found `&mut Context`
    |
    = note: expected mutable reference `&mut Canvas`
               found mutable reference `&mut ggez::Context`
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:293:8
    |
293 |     fn draw(&self, canvas: &mut Canvas, param: impl Into<DrawParam>);
    |        ^^^^

error[E0308]: mismatched types
   --> src/alien.rs:140:9
    |
134 |       pub fn draw(&self, ctx: &mut Context) -> GameResult<()> {
    |                                                -------------- expected `Result<(), GameError>` because of return type
...
140 | /         sprite.draw(
141 | |             ctx,
142 | |             DrawParam::default()
143 | |                 .offset(Point2::new(0.5, 0.5))
144 | |                 .dest(Point2::new(self.pos.0, self.pos.1)),
145 | |         )
    | |_________^ expected `Result<(), GameError>`, found `()`
    |
    = note:   expected enum `Result<(), GameError>`
            found unit type `()`
help: try adding an expression at the end of the block
    |
145 ~         );
146 +         Ok(())
    |

error[E0308]: mismatched types
   --> src/bullet.rs:54:13
    |
53  |         self.sprite.draw(
    |                     ---- arguments to this method are incorrect
54  |             ctx,
    |             ^^^ expected `&mut Canvas`, found `&mut Context`
    |
    = note: expected mutable reference `&mut Canvas`
               found mutable reference `&mut ggez::Context`
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:293:8
    |
293 |     fn draw(&self, canvas: &mut Canvas, param: impl Into<DrawParam>);
    |        ^^^^

error[E0308]: mismatched types
  --> src/bullet.rs:53:9
   |
52 |       pub fn draw(&self, ctx: &mut Context) -> GameResult<()> {
   |                                                -------------- expected `Result<(), GameError>` because of return type
53 | /         self.sprite.draw(
54 | |             ctx,
55 | |             DrawParam::default()
56 | |                 .offset(Point2::new(0.5, 0.5))
57 | |                 .dest(Point2::new(self.pos.0, self.pos.1))
58 | |                 .rotation(FRAC_PI_2),
59 | |         )
   | |_________^ expected `Result<(), GameError>`, found `()`
   |
   = note:   expected enum `Result<(), GameError>`
           found unit type `()`
help: try adding an expression at the end of the block
   |
59 ~         );
60 +         Ok(())
   |

error[E0061]: this method takes 1 argument but 0 arguments were supplied
   --> src/bullet.rs:100:34
    |
100 |         self.pos.0 - self.sprite.dimensions().w as f32 / 2.0
    |                                  ^^^^^^^^^^-- an argument of type `&_` is missing
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:299:8
    |
299 |     fn dimensions(&self, gfx: &impl Has<GraphicsContext>) -> Option<Rect>;
    |        ^^^^^^^^^^
help: provide the argument
    |
100 |         self.pos.0 - self.sprite.dimensions(/* gfx */).w as f32 / 2.0
    |                                            ~~~~~~~~~~~

error[E0609]: no field `w` on type `Option<Rect>`
   --> src/bullet.rs:100:47
    |
100 |         self.pos.0 - self.sprite.dimensions().w as f32 / 2.0
    |                                               ^ unknown field
    |
help: one of the expressions' fields has a field of the same name
    |
100 |         self.pos.0 - self.sprite.dimensions().unwrap().w as f32 / 2.0
    |                                               +++++++++

error[E0061]: this method takes 1 argument but 0 arguments were supplied
   --> src/bullet.rs:104:34
    |
104 |         self.pos.1 - self.sprite.dimensions().h as f32 / 2.0
    |                                  ^^^^^^^^^^-- an argument of type `&_` is missing
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:299:8
    |
299 |     fn dimensions(&self, gfx: &impl Has<GraphicsContext>) -> Option<Rect>;
    |        ^^^^^^^^^^
help: provide the argument
    |
104 |         self.pos.1 - self.sprite.dimensions(/* gfx */).h as f32 / 2.0
    |                                            ~~~~~~~~~~~

error[E0609]: no field `h` on type `Option<Rect>`
   --> src/bullet.rs:104:47
    |
104 |         self.pos.1 - self.sprite.dimensions().h as f32 / 2.0
    |                                               ^ unknown field
    |
help: one of the expressions' fields has a field of the same name
    |
104 |         self.pos.1 - self.sprite.dimensions().unwrap().h as f32 / 2.0
    |                                               +++++++++

error[E0061]: this method takes 1 argument but 0 arguments were supplied
   --> src/bullet.rs:108:21
    |
108 |         self.sprite.dimensions().w
    |                     ^^^^^^^^^^-- an argument of type `&_` is missing
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:299:8
    |
299 |     fn dimensions(&self, gfx: &impl Has<GraphicsContext>) -> Option<Rect>;
    |        ^^^^^^^^^^
help: provide the argument
    |
108 |         self.sprite.dimensions(/* gfx */).w
    |                               ~~~~~~~~~~~

error[E0609]: no field `w` on type `Option<Rect>`
   --> src/bullet.rs:108:34
    |
108 |         self.sprite.dimensions().w
    |                                  ^ unknown field
    |
help: one of the expressions' fields has a field of the same name
    |
108 |         self.sprite.dimensions().unwrap().w
    |                                  +++++++++

error[E0061]: this method takes 1 argument but 0 arguments were supplied
   --> src/bullet.rs:112:21
    |
112 |         self.sprite.dimensions().h
    |                     ^^^^^^^^^^-- an argument of type `&_` is missing
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:299:8
    |
299 |     fn dimensions(&self, gfx: &impl Has<GraphicsContext>) -> Option<Rect>;
    |        ^^^^^^^^^^
help: provide the argument
    |
112 |         self.sprite.dimensions(/* gfx */).h
    |                               ~~~~~~~~~~~

error[E0609]: no field `h` on type `Option<Rect>`
   --> src/bullet.rs:112:34
    |
112 |         self.sprite.dimensions().h
    |                                  ^ unknown field
    |
help: one of the expressions' fields has a field of the same name
    |
112 |         self.sprite.dimensions().unwrap().h
    |                                  +++++++++

error[E0061]: this method takes 1 argument but 0 arguments were supplied
   --> src/main.rs:53:34
    |
53  |         self.pos.0 - self.sprite.dimensions().w as f32 / 2.0
    |                                  ^^^^^^^^^^-- an argument of type `&_` is missing
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:299:8
    |
299 |     fn dimensions(&self, gfx: &impl Has<GraphicsContext>) -> Option<Rect>;
    |        ^^^^^^^^^^
help: provide the argument
    |
53  |         self.pos.0 - self.sprite.dimensions(/* gfx */).w as f32 / 2.0
    |                                            ~~~~~~~~~~~

error[E0609]: no field `w` on type `Option<Rect>`
  --> src/main.rs:53:47
   |
53 |         self.pos.0 - self.sprite.dimensions().w as f32 / 2.0
   |                                               ^ unknown field
   |
help: one of the expressions' fields has a field of the same name
   |
53 |         self.pos.0 - self.sprite.dimensions().unwrap().w as f32 / 2.0
   |                                               +++++++++

error[E0061]: this method takes 1 argument but 0 arguments were supplied
   --> src/main.rs:57:34
    |
57  |         self.pos.1 - self.sprite.dimensions().h as f32 / 2.0
    |                                  ^^^^^^^^^^-- an argument of type `&_` is missing
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:299:8
    |
299 |     fn dimensions(&self, gfx: &impl Has<GraphicsContext>) -> Option<Rect>;
    |        ^^^^^^^^^^
help: provide the argument
    |
57  |         self.pos.1 - self.sprite.dimensions(/* gfx */).h as f32 / 2.0
    |                                            ~~~~~~~~~~~

error[E0609]: no field `h` on type `Option<Rect>`
  --> src/main.rs:57:47
   |
57 |         self.pos.1 - self.sprite.dimensions().h as f32 / 2.0
   |                                               ^ unknown field
   |
help: one of the expressions' fields has a field of the same name
   |
57 |         self.pos.1 - self.sprite.dimensions().unwrap().h as f32 / 2.0
   |                                               +++++++++

error[E0061]: this method takes 1 argument but 0 arguments were supplied
   --> src/main.rs:61:21
    |
61  |         self.sprite.dimensions().w
    |                     ^^^^^^^^^^-- an argument of type `&_` is missing
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:299:8
    |
299 |     fn dimensions(&self, gfx: &impl Has<GraphicsContext>) -> Option<Rect>;
    |        ^^^^^^^^^^
help: provide the argument
    |
61  |         self.sprite.dimensions(/* gfx */).w
    |                               ~~~~~~~~~~~

error[E0609]: no field `w` on type `Option<Rect>`
  --> src/main.rs:61:34
   |
61 |         self.sprite.dimensions().w
   |                                  ^ unknown field
   |
help: one of the expressions' fields has a field of the same name
   |
61 |         self.sprite.dimensions().unwrap().w
   |                                  +++++++++

error[E0061]: this method takes 1 argument but 0 arguments were supplied
   --> src/main.rs:65:21
    |
65  |         self.sprite.dimensions().h
    |                     ^^^^^^^^^^-- an argument of type `&_` is missing
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:299:8
    |
299 |     fn dimensions(&self, gfx: &impl Has<GraphicsContext>) -> Option<Rect>;
    |        ^^^^^^^^^^
help: provide the argument
    |
65  |         self.sprite.dimensions(/* gfx */).h
    |                               ~~~~~~~~~~~

error[E0609]: no field `h` on type `Option<Rect>`
  --> src/main.rs:65:34
   |
65 |         self.sprite.dimensions().h
   |                                  ^ unknown field
   |
help: one of the expressions' fields has a field of the same name
   |
65 |         self.sprite.dimensions().unwrap().h
   |                                  +++++++++

error[E0624]: associated function `new` is private
   --> src/main.rs:120:50
    |
120 |               alien_idle: Rc::new(graphics::Image::new(ctx, "/ENEMY.png")?),
    |                                                    ^^^ private associated function
    |
   ::: /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:5
    |
161 | /     fn new(
162 | |         wgpu: &WgpuContext,
163 | |         format: ImageFormat,
164 | |         width: u32,
...   |
167 | |         usage: wgpu::TextureUsages,
168 | |     ) -> Self {
    | |_____________- private associated function defined here

error[E0061]: this function takes 6 arguments but 2 arguments were supplied
   --> src/main.rs:120:33
    |
120 |             alien_idle: Rc::new(graphics::Image::new(ctx, "/ENEMY.png")?),
    |                                 ^^^^^^^^^^^^^^^^^^^^-------------------
    |                                                     ||    |
    |                                                     ||    expected `TextureFormat`, found `&str`
    |                                                     |expected `&WgpuContext`, found `&mut Context`
    |                                                     multiple arguments are missing
    |
    = note:      expected reference `&WgpuContext`
            found mutable reference `&mut ggez::Context`
note: associated function defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:8
    |
161 |     fn new(
    |        ^^^
help: provide the arguments
    |
120 |             alien_idle: Rc::new(graphics::Image::new(/* &WgpuContext */, /* wgpu_types::TextureFormat */, /* u32 */, /* u32 */, /* u32 */, /* wgpu_types::TextureUsages */)?),
    |                                                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> src/main.rs:120:33
    |
120 |             alien_idle: Rc::new(graphics::Image::new(ctx, "/ENEMY.png")?),
    |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `Image`
    |
    = help: the trait `Try` is not implemented for `Image`

error[E0624]: associated function `new` is private
   --> src/main.rs:121:52
    |
121 |               alien_firing: Rc::new(graphics::Image::new(ctx, "/ENEMY_FIRING.png")?),
    |                                                      ^^^ private associated function
    |
   ::: /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:5
    |
161 | /     fn new(
162 | |         wgpu: &WgpuContext,
163 | |         format: ImageFormat,
164 | |         width: u32,
...   |
167 | |         usage: wgpu::TextureUsages,
168 | |     ) -> Self {
    | |_____________- private associated function defined here

error[E0061]: this function takes 6 arguments but 2 arguments were supplied
   --> src/main.rs:121:35
    |
121 |             alien_firing: Rc::new(graphics::Image::new(ctx, "/ENEMY_FIRING.png")?),
    |                                   ^^^^^^^^^^^^^^^^^^^^--------------------------
    |                                                       ||    |
    |                                                       ||    expected `TextureFormat`, found `&str`
    |                                                       |expected `&WgpuContext`, found `&mut Context`
    |                                                       multiple arguments are missing
    |
    = note:      expected reference `&WgpuContext`
            found mutable reference `&mut ggez::Context`
note: associated function defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:8
    |
161 |     fn new(
    |        ^^^
help: provide the arguments
    |
121 |             alien_firing: Rc::new(graphics::Image::new(/* &WgpuContext */, /* wgpu_types::TextureFormat */, /* u32 */, /* u32 */, /* u32 */, /* wgpu_types::TextureUsages */)?),
    |                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> src/main.rs:121:35
    |
121 |             alien_firing: Rc::new(graphics::Image::new(ctx, "/ENEMY_FIRING.png")?),
    |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `Image`
    |
    = help: the trait `Try` is not implemented for `Image`

error[E0624]: associated function `new` is private
   --> src/main.rs:122:46
    |
122 |               player: Rc::new(graphics::Image::new(ctx, "/PLAYER_OLD_2.png")?),
    |                                                ^^^ private associated function
    |
   ::: /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:5
    |
161 | /     fn new(
162 | |         wgpu: &WgpuContext,
163 | |         format: ImageFormat,
164 | |         width: u32,
...   |
167 | |         usage: wgpu::TextureUsages,
168 | |     ) -> Self {
    | |_____________- private associated function defined here

error[E0061]: this function takes 6 arguments but 2 arguments were supplied
   --> src/main.rs:122:29
    |
122 |             player: Rc::new(graphics::Image::new(ctx, "/PLAYER_OLD_2.png")?),
    |                             ^^^^^^^^^^^^^^^^^^^^--------------------------
    |                                                 ||    |
    |                                                 ||    expected `TextureFormat`, found `&str`
    |                                                 |expected `&WgpuContext`, found `&mut Context`
    |                                                 multiple arguments are missing
    |
    = note:      expected reference `&WgpuContext`
            found mutable reference `&mut ggez::Context`
note: associated function defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:8
    |
161 |     fn new(
    |        ^^^
help: provide the arguments
    |
122 |             player: Rc::new(graphics::Image::new(/* &WgpuContext */, /* wgpu_types::TextureFormat */, /* u32 */, /* u32 */, /* u32 */, /* wgpu_types::TextureUsages */)?),
    |                                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> src/main.rs:122:29
    |
122 |             player: Rc::new(graphics::Image::new(ctx, "/PLAYER_OLD_2.png")?),
    |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `Image`
    |
    = help: the trait `Try` is not implemented for `Image`

error[E0624]: associated function `new` is private
   --> src/main.rs:123:50
    |
123 |               red_bullet: Rc::new(graphics::Image::new(ctx, "/Red_Missile.png")?),
    |                                                    ^^^ private associated function
    |
   ::: /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:5
    |
161 | /     fn new(
162 | |         wgpu: &WgpuContext,
163 | |         format: ImageFormat,
164 | |         width: u32,
...   |
167 | |         usage: wgpu::TextureUsages,
168 | |     ) -> Self {
    | |_____________- private associated function defined here

error[E0061]: this function takes 6 arguments but 2 arguments were supplied
   --> src/main.rs:123:33
    |
123 |             red_bullet: Rc::new(graphics::Image::new(ctx, "/Red_Missile.png")?),
    |                                 ^^^^^^^^^^^^^^^^^^^^-------------------------
    |                                                     ||    |
    |                                                     ||    expected `TextureFormat`, found `&str`
    |                                                     |expected `&WgpuContext`, found `&mut Context`
    |                                                     multiple arguments are missing
    |
    = note:      expected reference `&WgpuContext`
            found mutable reference `&mut ggez::Context`
note: associated function defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:8
    |
161 |     fn new(
    |        ^^^
help: provide the arguments
    |
123 |             red_bullet: Rc::new(graphics::Image::new(/* &WgpuContext */, /* wgpu_types::TextureFormat */, /* u32 */, /* u32 */, /* u32 */, /* wgpu_types::TextureUsages */)?),
    |                                                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> src/main.rs:123:33
    |
123 |             red_bullet: Rc::new(graphics::Image::new(ctx, "/Red_Missile.png")?),
    |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `Image`
    |
    = help: the trait `Try` is not implemented for `Image`

error[E0624]: associated function `new` is private
   --> src/main.rs:124:52
    |
124 |               green_bullet: Rc::new(graphics::Image::new(ctx, "/MISSILE_FIRED.png")?),
    |                                                      ^^^ private associated function
    |
   ::: /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:5
    |
161 | /     fn new(
162 | |         wgpu: &WgpuContext,
163 | |         format: ImageFormat,
164 | |         width: u32,
...   |
167 | |         usage: wgpu::TextureUsages,
168 | |     ) -> Self {
    | |_____________- private associated function defined here

error[E0061]: this function takes 6 arguments but 2 arguments were supplied
   --> src/main.rs:124:35
    |
124 |             green_bullet: Rc::new(graphics::Image::new(ctx, "/MISSILE_FIRED.png")?),
    |                                   ^^^^^^^^^^^^^^^^^^^^---------------------------
    |                                                       ||    |
    |                                                       ||    expected `TextureFormat`, found `&str`
    |                                                       |expected `&WgpuContext`, found `&mut Context`
    |                                                       multiple arguments are missing
    |
    = note:      expected reference `&WgpuContext`
            found mutable reference `&mut ggez::Context`
note: associated function defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:8
    |
161 |     fn new(
    |        ^^^
help: provide the arguments
    |
124 |             green_bullet: Rc::new(graphics::Image::new(/* &WgpuContext */, /* wgpu_types::TextureFormat */, /* u32 */, /* u32 */, /* u32 */, /* wgpu_types::TextureUsages */)?),
    |                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> src/main.rs:124:35
    |
124 |             green_bullet: Rc::new(graphics::Image::new(ctx, "/MISSILE_FIRED.png")?),
    |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `Image`
    |
    = help: the trait `Try` is not implemented for `Image`

error[E0624]: associated function `new` is private
   --> src/main.rs:125:50
    |
125 |               background: Rc::new(graphics::Image::new(ctx, "/Space.png")?),
    |                                                    ^^^ private associated function
    |
   ::: /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:5
    |
161 | /     fn new(
162 | |         wgpu: &WgpuContext,
163 | |         format: ImageFormat,
164 | |         width: u32,
...   |
167 | |         usage: wgpu::TextureUsages,
168 | |     ) -> Self {
    | |_____________- private associated function defined here

error[E0061]: this function takes 6 arguments but 2 arguments were supplied
   --> src/main.rs:125:33
    |
125 |             background: Rc::new(graphics::Image::new(ctx, "/Space.png")?),
    |                                 ^^^^^^^^^^^^^^^^^^^^-------------------
    |                                                     ||    |
    |                                                     ||    expected `TextureFormat`, found `&str`
    |                                                     |expected `&WgpuContext`, found `&mut Context`
    |                                                     multiple arguments are missing
    |
    = note:      expected reference `&WgpuContext`
            found mutable reference `&mut ggez::Context`
note: associated function defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/image.rs:161:8
    |
161 |     fn new(
    |        ^^^
help: provide the arguments
    |
125 |             background: Rc::new(graphics::Image::new(/* &WgpuContext */, /* wgpu_types::TextureFormat */, /* u32 */, /* u32 */, /* u32 */, /* wgpu_types::TextureUsages */)?),
    |                                                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> src/main.rs:125:33
    |
125 |             background: Rc::new(graphics::Image::new(ctx, "/Space.png")?),
    |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `Image`
    |
    = help: the trait `Try` is not implemented for `Image`

error[E0061]: this method takes 1 argument but 0 arguments were supplied
   --> src/main.rs:257:30
    |
257 |             game.audio.bloop.play()?;
    |                              ^^^^-- an argument of type `&_` is missing
    |
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/audio.rs:131:8
    |
131 |     fn play(&mut self, audio: &impl Has<AudioContext>) -> GameResult {
    |        ^^^^
help: provide the argument
    |
257 |             game.audio.bloop.play(/* audio */)?;
    |                                  ~~~~~~~~~~~~~

error[E0308]: mismatched types
   --> src/main.rs:274:34
    |
274 |     game.sprites.background.draw(ctx, DrawParam::default())
    |                             ---- ^^^ expected `&mut Canvas`, found `&mut Context`
    |                             |
    |                             arguments to this method are incorrect
    |
    = note: expected mutable reference `&mut Canvas`
               found mutable reference `&mut ggez::Context`
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:293:8
    |
293 |     fn draw(&self, canvas: &mut Canvas, param: impl Into<DrawParam>);
    |        ^^^^

error[E0308]: mismatched types
   --> src/main.rs:274:5
    |
273 | fn draw_background(ctx: &mut Context, game: &Game) -> GameResult<()> {
    |                                                       -------------- expected `Result<(), GameError>` because of return type
274 |     game.sprites.background.draw(ctx, DrawParam::default())
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `Result<(), GameError>`, found `()`
    |
    = note:   expected enum `Result<(), GameError>`
            found unit type `()`
help: try adding an expression at the end of the block
    |
274 ~     game.sprites.background.draw(ctx, DrawParam::default());
275 +     Ok(())
    |

error[E0308]: mismatched types
   --> src/main.rs:290:9
    |
289 |     game.sprites.player.draw(
    |                         ---- arguments to this method are incorrect
290 |         ctx,
    |         ^^^ expected `&mut Canvas`, found `&mut Context`
    |
    = note: expected mutable reference `&mut Canvas`
               found mutable reference `&mut ggez::Context`
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:293:8
    |
293 |     fn draw(&self, canvas: &mut Canvas, param: impl Into<DrawParam>);
    |        ^^^^

error[E0308]: mismatched types
   --> src/main.rs:289:5
    |
288 |   fn draw_player(ctx: &mut Context, game: &Game) -> GameResult<()> {
    |                                                     -------------- expected `Result<(), GameError>` because of return type
289 | /     game.sprites.player.draw(
290 | |         ctx,
291 | |         DrawParam::default()
292 | |             .offset(Point2::new(0.5, 0.5))
293 | |             .dest(Point2::new(game.player.pos.0, game.player.pos.1))
294 | |             .rotation(FRAC_PI_2),
295 | |     )
    | |_____^ expected `Result<(), GameError>`, found `()`
    |
    = note:   expected enum `Result<(), GameError>`
            found unit type `()`
help: try adding an expression at the end of the block
    |
295 ~     );
296 +     Ok(())
    |

error[E0308]: mismatched types
   --> src/main.rs:301:9
    |
300 |     text.draw(
    |          ---- arguments to this method are incorrect
301 |         ctx,
    |         ^^^ expected `&mut Canvas`, found `&mut Context`
    |
    = note: expected mutable reference `&mut Canvas`
               found mutable reference `&mut ggez::Context`
note: method defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/graphics/draw.rs:293:8
    |
293 |     fn draw(&self, canvas: &mut Canvas, param: impl Into<DrawParam>);
    |        ^^^^

error[E0277]: the `?` operator can only be applied to values that implement `Try`
   --> src/main.rs:300:5
    |
300 | /     text.draw(
301 | |         ctx,
302 | |         DrawParam::default().dest(Point2::new(
303 | |             game.screen_size.0 as f32 / 2.0,
304 | |             game.screen_size.1 as f32 / 2.0,
305 | |         )),
306 | |     )?;
    | |______^ the `?` operator cannot be applied to type `()`
    |
    = help: the trait `Try` is not implemented for `()`

error[E0277]: the trait bound `&mut Game: EventHandler<_>` is not satisfied
   --> src/main.rs:375:43
    |
375 |     event::run(&mut ctx, &mut event_loop, &mut my_game)?;
    |     ----------                            ^^^^^^^^^^^^ the trait `EventHandler<_>` is not implemented for `&mut Game`
    |     |
    |     required by a bound introduced by this call
    |
note: required by a bound in `run`
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/event.rs:283:8
    |
281 | pub fn run<S: 'static, E>(mut ctx: Context, event_loop: EventLoop<()>, mut state: S) -> !
    |        --- required by a bound in this function
282 | where
283 |     S: EventHandler<E>,
    |        ^^^^^^^^^^^^^^^ required by this bound in `run`
help: consider removing the leading `&`-reference
    |
375 -     event::run(&mut ctx, &mut event_loop, &mut my_game)?;
375 +     event::run(&mut ctx, &mut event_loop, my_game)?;
    |

error[E0308]: arguments to this function are incorrect
   --> src/main.rs:375:5
    |
375 |     event::run(&mut ctx, &mut event_loop, &mut my_game)?;
    |     ^^^^^^^^^^ -------- expected `Context`, found `&mut Context`
    |
note: expected `EventLoop<()>`, found `&mut EventLoop<()>`
   --> src/main.rs:375:26
    |
375 |     event::run(&mut ctx, &mut event_loop, &mut my_game)?;
    |                          ^^^^^^^^^^^^^^^
    = note:         expected struct `EventLoop<_>`
            found mutable reference `&mut EventLoop<_>`
note: function defined here
   --> /Users/emccue/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ggez-0.9.3/src/event.rs:281:8
    |
281 | pub fn run<S: 'static, E>(mut ctx: Context, event_loop: EventLoop<()>, mut state: S) -> !
    |        ^^^
help: consider removing the borrow
    |
375 -     event::run(&mut ctx, &mut event_loop, &mut my_game)?;
375 +     event::run(ctx, &mut event_loop, &mut my_game)?;
    |
help: consider removing the borrow
    |
375 -     event::run(&mut ctx, &mut event_loop, &mut my_game)?;
375 +     event::run(&mut ctx, event_loop, &mut my_game)?;
    |

warning: unreachable call
   --> src/main.rs:375:5
    |
375 |     event::run(&mut ctx, &mut event_loop, &mut my_game)?;
    |     ---------------------------------------------------^
    |     |
    |     unreachable call
    |     any code following this expression is unreachable
    |
    = note: `#[warn(unreachable_code)]` on by default

warning: unused import: `BulletFactory`
  --> src/main.rs:26:29
   |
26 | use crate::bullet::{Bullet, BulletFactory, BulletFactoryImpl};
   |                             ^^^^^^^^^^^^^

Some errors have detailed explanations: E0050, E0053, E0061, E0277, E0308, E0425, E0432, E0603, E0609...
For more information about an error, try `rustc --explain E0050`.
warning: `rustisbetter` (bin "rustisbetter") generated 11 warnings
error: could not compile `rustisbetter` (bin "rustisbetter") due to 61 previous errors; 11 warnings emitted

Guess they really took ZeroVer to heart, huh?

So as it stands I have no clue how to run this Rust project on my Laptop.

  1. If you have a clue, let me know.
  2. Why did this Rust project bit-rot? Actually curious.
  3. Is this representative of what will happen to any Rust project I make?

<- Index

Rust Just Failed an Important Test

by: Ethan McCue

I have two Rust projects I maintain.

The first is a parser for the EDN Data Format. I haven't had to touch that one in a while. Best I can tell it's all still working.

The second is a fork of the Rust Playground for running Java code. I also haven't had to touch that one in a while, but I did today to update the versions of Java available and include updated early access builds.

When I did that, despite having not changed any dependencies, I got a build error in CI/CD. Build log is here if anyone wants to see.

   Compiling io-lifetimes v1.0.11
   Compiling doc-comment v0.3.3
   Compiling smallvec v1.10.0
   Compiling pin-project v1.1.0
   Compiling miniz_oxide v0.6.2
   Compiling time v0.3.22
error[E0282]: type annotations needed for `Box<_>`
  --> /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/time-0.3.22/src/format_description/parse/mod.rs:83:9
   |
83 |     let items = format_items
   |         ^^^^^
...
86 |     Ok(items.into())
   |              ---- type must be known at this point
   |
help: consider giving `items` an explicit type, where the placeholders `_` are specified
   |
83 |     let items: Box<_> = format_items
   |              ++++++++

For more information about this error, try `rustc --explain E0282`.

And just like that, I've lost trust in Rust's resiliency to bit-rot.

It's not that deep an error and I resolved it by pinning a higher version of the time library, but it still sucks.

To me, whether code will "just work" into the future is an important property of a language and ecosystem. Maybe I had inflated expectations because of Rust editions, and maybe I'm being too harsh, but I now have this feeling of unease that I didn't before.


<- Index

You can run Java like Python now

by: Ethan McCue

This is meant to be a brief PSA for the programming general public. All this is known to the people following Java closely, but I figure most are not.

As of Java 22, you can run Java code like you would an interpreted language such as Python, Ruby, JavaScript, etc. This means that ahead-of-time compilation is no longer strictly required.

Say you have the following files.

src/Main.java

class Main {
    public static void main(String[] args) {
        System.out.println(Example.text());    
    }
}

src/Example.java

class Example {
    static String text() {
        return "example";
    }
}

You can directly run this project with java src/Main.java.

This is very new. The Java ecosystem doesn't yet have an accepted equivalent of pip or npm that isn't also tied to a build tool. Now that a build tool isn't required I figure that will come around soon enough.1

As a sidenote, public static void main(String[] args) and System.out.println are also no longer going to be needed. Stay tuned.

1: There are two tools that most closely fit the mould today. The first is Coursier, a tool that has been around in the Scala community for a while. The second is jresolve, a tool I produced that has a few bugs and missing features, but that I think could be a better fit with more time and polish.


<- Index

After CrowdStrike, Programmers Deserve Consequences.

by: Ethan McCue

An Anesthesiologist can expect a salary of over $300k. This is because putting you to sleep for surgery is actually kinda risky. If they do their job wrong you die. Their salary reflects the fact that they take on much of the liability for that.

When a Structural Engineer finishes a design, they sign off on it. If something goes wrong with that structure due to their negligence, and it kills someone, that engineer might be on the hook for manslaughter.

Yesterday a friend of mine was stuck in the Hospital all day. Their computer system went down and that led to a delay of care. Delays in care kill people.

All over the world Hospitals, Airlines, Banks, etc. - critical infrastructure - was taken down by a bad patch in some random bit of software. This time it was CrowdStrike but let's be a hundred percent fucking real with ourselves it could have been anything.

It's an open secret that the entire software development field is a bit of a clusterfuck. Attempts to impose standards and restrictions largely fail. It is diminishingly rare to finish a project on budget, on time, and without defects. The education software developers receive is often woefully inadequate. The space is flooded with grifters, conpersons, imbeciles, and fanatics. We idolize and pray to emulate success stories like Facebook (a grand machine which reminds me of birthdays and drives teenagers to suicide.) It's just bad, man.

Software "Engineers" are never held personally accountable for the effects their actions have on the world. That poor bastard or bastard(s) at CrowdStrike weren't paid anesthesiologist rates and yet their mistake is going to kill a lot of people. I doubt they would have signed off on anything they'd done in the last decade as being "defect-free" and yet that is the standard we rightfully hold other fields to.

Something needs to change and I doubt anything other than real, uniformly applied, consequences will make a difference.

For a more intelligently spoken, less emotionally driven, take on this watch the David Sankel talk I embedded below.

EDIT

To clarify, I am not saying that an individual at the bottom of the chain of decision-making is materially responsible for this outage.

Based on the degree to which what was in my head was received as almost the opposite message by so many, I am pretty sure I wrote this poorly.

I think this reddit comment did a good job distilling something I wish I got across.

The reason why anethesiologists and structural engineers can take responsibility for their work is because they are legally responsible for the consequences of their actions, specifically of things within their individual control. They are members of regulated, professional credentialing organisations (i.e., only a licensed 'professional engineer' can sign off certain things; only a board-certified anethesiologist can perform on patients.) It has nothing to do with 'respect'.

Software developers as individuals should not be scapegoated in this Crowdstrike situation specifically because they are not licensed, there are no legal standards to be met for the title or the role, and therefore they are the 'peasants' (as the author calls them) who must do as they are told by the business.

And also this post I wrote and this one a reply down are at least a little clearer on where I think the blame lies for this particular outage.

I am not and wish I never came so close to implying that in this exact instance we should blame a coder for what was clearly a process issue.

It's just that even though we all know that not unit testing or performing QA is negligent behavior our field doesn't actually have any codes that are enforced by law.

The reason I implied that programmers should see consequences isn't because I misunderstood how development works or that CrowdStrike was largely caused by chains of terrible management. It's because without any codes similar to those fields we will never be taken seriously. My thought process was "if it matters, we will make codes. If we make codes then maybe we edge closer to being an actual engineering discipline."

And seriously watch the video I linked. It did a way less shitty job than I did.


Yeah I'm a demonstrably bad communicator.

I agree with everything you are saying and i think we agree on what they shape of things should be.

But I think without actual codes that you can hold someone to there is no basis upon which to punish a company for not following them.

Skipping past how we get from here to there, in a world where development of critical systems is regulated and folks are licensed as engineers there should be consequences if one of those licensed engineers is negligent.

But I fucked up hard by just saying programmers deserve consequences. People assumed I meant "yeah let's get the guy who did this!" I really mean "programmers deserve to live in the world where their actions are given weight and recognized as an engineering discipline with consequences for negligence all the way up the chain."


<- Index

A Dramatic Reading: I Will Fucking Piledrive You If You Mention AI Again

by: Ethan McCue

I thoroughly enjoyed reading this blog post entitled "I Will Fucking Piledrive You If You Mention AI Again" on the Lucidity blog.

I choose to hope that it is a sign that professional developers, as a group, have a developed at least a few anti-BS antibodies following the crypto bubble popping.

I also think that the approach it takes is probably one of the most socially effective ones. More effective than the "well, maybe there is some application of the technology" tepidness that allowed for crypto scams to flourish unfettered and lure in impressionable new coders.

So in the spirit of keeping the "you're not welcome here" message alive in the news cycle, I commissioned a professional voice actor to give a full dramatic reading of that blog post.1

Enjoy.

If you have a project in need of a voice actor, you can find their portfolio here.

1: I obviously do not have any rights to the original blog post so I can't say "you can use the recording for anything," but I need to make clear that you are not allowed to use any aspect of the recording to train a generative AI. Anything else I am able to grant permission for, I do.


<- Index

Extension methods make code harder to read, actually

by: Ethan McCue

I apologize in advance for whatever comment sections form around this.

What are instance methods?

In many languages you can associate functions with a type.

class Dog {
    void bark() {
        System.out.println("Bark!");
    }
}

The name these are given differs on the language you are talking about and who you are talking to, but we'll go forward calling these "instance methods."

Instance methods are defined at the same time as the type is declared.

class Dog { // Type declared here
    void bark() { // Method declared within it
        System.out.println("Bark!");
    }
}

Instance methods can have access to fields or properties of the type they are associated with that might not be accessible to other code.

class Dog {
    private final String name;
    
    Dog(String name) {
        this.name = name;
    }
    
    void bark() {
        // name is accessible to this method, but not to outsiders
        if (name.equals("Scooby")) {
            System.out.println("Scooby-Dooby-Doo!");
        }
        else {
            System.out.println("Bark!");
        }
    }
}

And, in languages with the ability to "extend" types, instance methods might be overloaded by a subtype.

class Pomeranian extends Dog {
    @Override
    void bark() {
        System.out.println("bork.");
    }
}

Importantly instance methods are also "convenient" to call.

Most code editors can catch you after you've written the . after dog and offer an autocomplete list of "methods you might want to call."

void main() {
    var dog = new Dog("Scooby");
    // After "dog.b" you should be able to hit enter and
    // have "dog.bark()" filled in for you.
    dog.bark();
}

In addition to discovery, this is convenient for a practice known as "chaining." If one method returns an object which can itself have methods called on it you can "chain" another method call on the end.

void main() {
    String name = "  Scrappy   ";
    
    name = name
            .toLowerCase()
            .strip()
            .concat(" dappy doo");
    
    System.out.println(name);
}

This is widely considered to be aesthetically pleasing and will be the surprise villain of today's story.

What are extension methods?

If you are not the author of a type, but want to write functionality that builds upon the exposed methods and fields of one, you can write code of your own.

class DogUtils {
    private DogUtils() {}
    
    static void playFetch(Dog dog) {
        System.out.println("Throwing stick...");
        dog.bark();
        System.out.println("Stick retrieved.");
    }
}

Calling such a method will generally look different from calling an instance method.

void main() {
    var dog = new Dog("Scooby");
    DogUtils.playFetch(dog);
}

Importantly you need to know where to look for it (in this case that there is playFetch in DogUtils) and won't get that helpful autocomplete from writing dog.

Externally defined methods also don't play nicely with method chaining. Whenever you need to call them you probably need to "break the chain."

void main() {
    String name = "  SCRAPPY   ";
    
    name = name.toLowerCase();
    
    name = StringUtils.capitalizeFirstLetter(name);
    
    name = name
            .strip()
            .concat(" Dappy doo");
    
    System.out.println(name);
}

This is considered aesthetically displeasing.

Extension methods are a language feature that allow someone to make calling these externally defined methods look like calling an instance method.

// This is the "manifold" Java superset
// http://manifold.systems/docs.html
@Extension
class DogUtils {
    private DogUtils() {}
    
    static void playFetch(Dog dog) {
        System.out.println("Throwing stick...");
        dog.bark();
        System.out.println("Stick retrieved.");
    }
}
void main() {
    var dog = new Dog("Scooby");
    dog.playFetch(); // This turns into a call to DogUtils.playFetch
}

Upsides of extension methods

Because calling an extension method looks the same as calling an instance method, downstream users of a library can make a suboptimal API more tolerable by adding their own methods.

As an example, the Kotlin language uses its extension mechanism to "add methods" to java.lang.String that the Kotlin team would prefer existed.

This can make code more aesthetically pleasing and enables method chains to go unbroken, which in turn can make code easier to write.

void main() {
    String name = "  SCRAPPY   ";

    name = name
            .toLowerCase()
            .capitalizeFirstLetter()
            .strip()
            .concat(" Dappy doo");

    System.out.println(name);
}

This is often confused with making code easier to read.

Downsides of extension methods

1. They make life harder for library maintainers

Java added the .strip() method to String in Java 11. .trim() already existed but it isn't "unicode aware" and won't trim off everything we would consider to be whitespace.

As such, it would have been an ideal target for an extension method.

@Extension
final class StringUtils {
    private StringUtils() {}
    
    static String strip(String s) {
        // ...
    }
}

So if Java had extension methods there would have certainly been code that looks like this out in the world.

void main() {
    String catchphrase = "  zoinks  ";
    
    catchphrase = catchphrase.strip();
    
    System.out.println(catchphrase);
}

Where every call to .strip() was translated to a call to StringUtils.strip.

Now consider what happens when you go forward in time and the person writing String decides to add their own .strip() method.

If you recompile code that looks like the above does it

  • A: Fail to compile. The compiler can't decide which one to use, you need to disambiguate somehow.
  • B: Continue to use the extension method.
  • C: Switch to using the instance method.

All of these options suck.

If it fails to compile now library authors need to consider how likely it is that adding a brand-new method is going to break downstream code. This is something that, in the absence of extension methods, is one of the few things that is basically a free action.

If it continues to use the extension method that can quickly become a code readability hazard. People form their own internal roladexes of what methods are available on certain types and what they do. If someone sees .strip() called on a String its not unreasonable for them to expect exactly the behavior of String#strip. If the semantics of the strip extension method differ from the semantics of the instance method...shit. Library maintainers need to care about this because any method they add that is likely to conflict with an existing extension method can trigger exactly this hazard.

If it switches to using the instance method now both library authors and library consumers need to be a lot more cautious when upgrading libraries. Code, as written, could change behavior from something as simple as adding a method. This is worse than failing to compile since at least if the compiler yells at you there is a sign that something is wrong.

2. They make code harder to read

Welcome to the part that was click-bait.

If the invocation of an instance method looks identical to invoking an extension method it is impossible to tell at a glance which is happening.

void main() {
    // Is this an extension method call or an instance method one?
    String name = "  Velma".stripLeading();
}

If the language automatically brings all extension methods "into scope" this problem is global to the entire codebase. If someone in some corner of the world adds an extension method that can alter the behavior of code or affect whether a particular line compiles.

If the language doesn't, that means you need some sort of import to make the extension methods available.

// If I hadn't been using this example the whole time, would
// you catch that "captializeFirstLetter" was the extension method?
@Extension(StringUtils.class)
class Main {
    void main() {
        String name = "  SCRAPPY   ";

        name = name
                .toLowerCase()
                .capitalizeFirstLetter()
                .strip()
                .concat(" Dappy doo");

        System.out.println(name);
    }
}

This is both a worse and similar situation to * imports. One line of code at the top of the file is needed for many other lines to be valid code, but there is no way to visually tie the two together.

import java.util.*;

void main() {
    var l = new ArrayList<String>();
}

The problem is that readability is about the ease of extracting information from text. Both * imports and any hypothetical design of extension methods make it harder to read code because they take information that could be written down and accessible and make it implicit.

That can be fine, sometimes. We're not in an anti-golf competition or anything. It is valid to trade readability for ease of writing. But we are lying to ourselves and/or others if we say that extension methods make code more readable.

What they do is make some code more aesthetically pleasing. Method chains are considered nice to look at. Beauty is just simply a different thing from comprehensibility.

3. They aren't that powerful, actually

There are more ways than extension methods to magically attach methods to types.

One of the ways that is popular in Scala is to use "implicits." Whenever you use a type in a context that it wouldn't otherwise work, Scala can implicitly wrap your type in another one that will make it work.

What does that mean? Well, if you had a line of code like this.

val name = "fred".capitalizeFirstLetter()

Then the Scala compiler will look for implicit conversions to a class that does have that method.

class EnrichedString(s: String) {
  def capitalizeFirstLetter: String = {
    Character.toUpperCase(s.charAt(0)) + s.substring(1, s.length())
  }
}

given Conversion[String, EnrichedString] with
  def apply(s: String): EnrichedString = EnrichedString(s)

val name = "fred".capitalizeFirstLetter

println(name)

This is more powerful since you aren't just able to magically add a method, you can magically implement an interface.

trait ThingDoer {
  def doThing: Unit
}

class EnrichedString(s: String) extends ThingDoer {
  def doThing: Unit = {
    println(s"Hello: ${s}")
  }
}

given Conversion[String, ThingDoer] with
  def apply(s: String): ThingDoer = EnrichedString(s)

val thingDoer: ThingDoer = "fred"

thingDoer.doThing

Are the rules for this confusing? Extremely.

Implicit conversions are applied in two situations:

  1. If an expression e is of type S, and S does not conform to the expression’s expected type T.
  2. In a selection e.m with e of type S, if the selector m does not denote a member of S (to support Scala-2-style extension methods).

In the first case, a conversion c is searched for, which is applicable to e and whose result type conforms to T.

Preach, sister.

Which is all to say that extension methods are the Weenie Hut Jr. version of implicits. You get all the downsides of context dependent code and pain for library maintainers, but in place of the really cool features (like being external code being able to implement an interface on a type they didn't define) we only get the most vapid benefit.

Method chaining.

Alternatives

1. Use a box

If you are working in a language which doesn't have extension methods, but you feel in your bones a strong desire to chain methods, try making a box.

import java.util.function.Function;

record Box<T>(T value) {
    <R> Box<R> map(Function<? super T, ? extends R> f) {
        return new Box<>(f.apply(value));
    }
}

If you box up the value you want to chain methods on then calling instance methods will actually look the same as externally defined ones.

void main() {
    String name = "  SCRAPPY   ";
    name = new Box<>(name)
            .map(String::toLowerCase)
            .map(StringUtils::capitalizeFirstLetter)
            .map(String::strip)
            .map(s -> s.concat(" Dappy doo"))
            .value();
    System.out.println(name);
}

Is this better than the code without chaining? Debatable. I lean towards no, but if "fluent chaining" is the goal, this achieves the goal. And, unlike a full-blown language feature, it doesn't affect the lives of those for whom method chaining is not an emotional priority.

Extend the type

If the author of a type is okay with you extending it and is ready to consider whatever extensions might exist in the wild when they make new versions of a library, they can make their class open to extension.

class Dog {
    void bark() {
        System.out.println("Bark!");
    }
}
class Dalmatian extends Dog {
    void playFetch() {
        System.out.println("Throwing stick...");
        dog.bark();
        System.out.println("Stick retrieved.");
    }
}

Does this have downsides? Yes, most definitely. You cannot subclass String and that's maybe 50-60% of why people want extension methods as a feature.

But its at least a mechanism that a library maintainer has control on whether they opt into.

Add a uniform calling syntax

Some languages don't have a special syntax for calling methods defined alongside a type. Accordingly, such languages often do not have an equivalent to extension methods.

import String.Extra

name: String 
name "   shaggy  rodgers "
    |> String.trim -- Defined alongside String
    |> String.Extra.toSentenceCase -- Defined by third party

So one possible path for a language to take would be to appease the method chaining junkies and add a new way to invoke methods that chains with instance methods.

void main() {
    String name = "  SCRAPPY   ";

    name = name
            .toLowerCase()
            |[StringUtils::capitalizeFirstLetter]
            .strip()
            .concat(" Dappy doo");

    System.out.println(name);
}

This is one of the proposed directions that JavaScript might take. It has its downsides as well, but they are different downsides.

Use default interface methods (or an equivalent)

While this doesn't help you add methods to arbitrary types you did not make, you can use interfaces to add methods to things in most languages that have them.

import java.util.function.Consumer;

interface IterableExtended<T> extends Iterable<T> {
    default void forEachTwice(Consumer<? super T> consumer) {
        this.forEach(t -> {
            consumer.accept(t);
            consumer.accept(t);
        });
    }
}
class Eight implements IterableExtended<Integer> {
    private boolean gotEight = false;
    
    public boolean hasNext() {
        return !gotEight;
    }
    
    public Integer next() {
        gotEight = true;
        return 8;
    }
}
void main() {
    var eight = new Eight();
    eight.forEachTwice(System.out::println);
}

This is a sort of extension method, it just is a technique that only works at the declaration site, not for arbitrary consumers to add.

Deal with it.

void main() {
    String name = "  SCRAPPY   ";

    name = name.toLowerCase();

    name = StringUtils.capitalizeFirstLetter(name);

    name = name
            .strip()
            .concat(" Dappy doo");

    System.out.println(name);
}

Conclusion

It is fine to like extension methods. It is also fine to think they are worth the tradeoffs.

What stinks is that people act like there aren't tradeoffs and that they are purely positive. The sort of vapid "why don't they just add extension methods? Idiots." infects discourse and, while I have no illusions anything I write can stop it, I hope that at least some people now understand why a language might choose to not have them.


<- Index

Modules Make javac Easy: Part. 2, Dependencies and Tests

by: Ethan McCue

This is a follow-up to this post.

The biggest things I left out in the workflow I was describing are how to handle external dependencies and how to run tests.

On the one hand, I feel like I understand how those would work today with the tools that exist. On the other, I'm pretty sure it can be done a little better.

Try to focus on whether the "shape" of the process feels alright to you and less on the specifics of any particular command.

Dependencies

I wrote a post on this before, but I made a tool called jresolve. It resolves transitive dependencies.1

If you want to get it to follow along you can use this script.

bash < <(curl -s  https://raw.githubusercontent.com/bowbahdoe/jresolve-cli/main/install)

Or download a .jar from GitHub Releases.

You can use jresolve to download libraries you want to have into a folder.

jresolve --output-directory libs \
    pkg:maven/org.springframework.boot/spring-boot-starter-web@3.3.0

This will include any transitive dependencies of those libraries.

jresolve --print-tree \             
    pkg:maven/org.springframework.boot/spring-boot-starter-web@3.3.0
org.springframework.boot/spring-boot-starter-web 3.3.0
  . org.springframework.boot/spring-boot-starter 3.3.0
    . org.springframework.boot/spring-boot 3.3.0
      . org.springframework/spring-core 6.1.8
...

The pkg:maven string is available at the top of the page for any artifact on Maven Central's Search.

If the list of dependencies gets too long you can put the dependencies you want in a file, say libs.txt.

pkg:maven/com.google.guava/guava@33.2.0-jre
pkg:maven/commons-codec/commons-codec@1.17.0

Then include that file with an @ at the end of the command.

jresolve --output-directory libs @libs.txt

Which puts all your dependencies in one place, easily addable to the module path.

javac \
    -d build/javac \
    --module-path libs \
    --module-source-path "./*/src" \
    --module web.hello
java --module-path libs:build/jar --module web.hello

Running Tests

JUnit has a command line launcher. It's not perfect yet and it's not on anything like SdkMan, but it is good enough for our purposes.

Add the dependencies you need for the command line launcher and for writing tests to your libs.txt.2

pkg:maven/org.junit.jupiter/junit-jupiter-api@5.10.2
pkg:maven/org.junit.platform/junit-platform-console@1.10.2
pkg:maven/org.junit.jupiter/junit-jupiter-engine@5.10.2

Make a module for your tests. And make it an open module so the test runner can do its magic.

open module web.hello.test {
    requires web.hello;
    requires org.junit.jupiter.api;
}

Write a test in this module.

import org.junit.jupiter.api.Test;
import web.hello.HelloController;

import static org.junit.jupiter.api.Assertions.assertEquals;

public class HelloControllerTest {
    @Test
    public void getHello() {
        assertEquals(
                new HelloController().index(),
                "Greetings from Spring Boot!"
        );
    }
}

Then you can launch the test runner like any other code.

java \
    --module-path libs:build/jar \
    --add-modules web.hello.test,web.util.test \
    --module org.junit.platform.console \
    execute \
    --select-module web.hello.test

Which is a little long - I have hopes in the future I can write something like the following.

junit \
    execute \
    --module-path libs:build/jar \
    --select-module web.hello.test

But the basics are that you launch junit, point it at your code, and run tests.

Wrap Up

While all this is more work than adding a dependency to a pom.xml and running mvn test, I'm not convinced its more complicated or any less powerful.

If anything the fact that doing things this way lets us interact more directly with tools like javac makes it feel more flexible.

I made a repo with this setup using Spring Boot that you can find here. All the commands you would run are in the Justfile. I included all the libraries needed in the repo in case you don't want to install my CLI tool for whatever reason.

1: Its gauche to pitch your own tool. Especially that one which is admittedly incomplete. One alternative is Coursier.

2: I know, I know - dependency scopes. This is a relatively large conversation to have, but with the module path things that aren't also "in the graph" aren't included. Having test dependencies in the same `libs` folder as other dependencies isn't as important as with the class path. Yes, making a docker image with just the dependencies needed for runtime needs scopes / a practice emulating it. I'm working my way around.


<- Index

Modules Make javac Easy

by: Ethan McCue

If you use Java modules, using javac to compile your code is easy.

I figure this wouldn't be known widely - its not that popular for people to use javac directly these days - but its interesting.

Without Modules

javac compiles any files you list in its invocation.

javac -d build src/Main.java src/Other.java

If the other source files are referenced from the ones you listed, you can use --source-path and javac will find the others.

 # Will find src/Other.java so long as Main uses it
javac -d build \
    --source-path src \
    src/Main.java

But, if your source files might not directly reference each other, you need to list every file in your project. That turns into something like this.

javac -d build \
    $(find . -name "*.java" -type f)

Which, while functional, doesn't inspire joy.

With Modules

All of the above methods work, even if you have a module-info.java.

But, if you lay out your code like this

example.mod/
  module-info.java
  example/
    mod/
      A.java
      B.java
      C.java

I.E. with a directory that has the same name as the module within it - then javac can automatically find and compile your code.

javac \
    -d build \
    --module-source-path . \
    --module example.mod

So --module-source-path tells it where to find all the code for a module and --module tells it what module you want to compile.

If you wanted all your code in a src/ folder you can do that as well. You just need to tweak the --module-source-path argument.

example.mod/
  src/
    module-info.java
    example/
      mod/
        A.java
        B.java
        C.java
javac \
    -d build \
    --module-source-path "./*/src" \
    --module example.mod

Where this becomes actually pretty cool is if you have more than one module.

Just put all your project's modules on the same level.

example.mod/
  src/
    module-info.java
    example/
      mod/
        A.java
 
other.mod/
  src/
    module-info.java
    other/
      mod/
        B.java

Now javac can compile more than one module at the same time.

javac \
    -d build \
    --module-source-path "./*/src" \
    --module example.mod,other.mod

If modules require each other - like if example.mod requires other.mod - then all modules will be compiled automatically.

Other Tools

Once you've laid out your code like this other tools, like javadoc, will also be able to automatically discover code for your modules

javadoc \
    -d docs \
    --module-source-path "./*/src" \
    --module example.mod,other.mod

Isn't that neat?

Wrap Up

This leaves off some crucial bits - like how you would get dependencies or run unit tests - but compare it holistically to setting up a multi-module build in Maven. Or Gradle. Or bld. Or whatever.

At least to me this feels way less painful. Worthy of a closer look.

I made a repo with a basic version of this setup here. All the commands you would run are in the Justfile. I also threw in making jars + including resources.


<- Index

Getting Started with java.sql

by: Ethan McCue

I get a lot of questions based on a very common school assignment.

A student is asked to make a desktop GUI app and, as part of that, connect to and work with a locally hosted MySQL database.

In this setup, presumably due to the same set of circumstances that leads to someone showing MySQL as an option for a locally hosted database (W.T.H. right?), people are shown some downright dangerously wrong ways of working with SQL.

This bit of writing is for me to send as a first message next time this comes up.

What is java.sql

java.sql is the module that contains the classes needed to connect to SQL databases in Java. We also call this API "JDBC", which stands for Java Database Connectivity.

You don't need to do anything special to get access to this, but if you have a module-info.java file in your program you will need to add a requires java.sql; line to it.

Install your database drivers.

Though the mechanisms you use to work with databases come with Java, the code to connect to the specific database you are using will not. This means you need to include a dependency.

For MySQL, you need to have the mysql-connector-j library. You should have been shown how to do this by now, but if not reach out.

Other DBs:

Get a DataSource

The first thing you want to do is get an object which implements the DataSource interface.

A DataSource is an object that can give you a connection to a database.

The exact way to do this varies from database to database, but for MySQL you need to create a new MysqlDataSource(). This is also the step where you should fill in any authentication info like username and password.

Also, only create one of these at the top of your program and pass it to everything else. Do not create a DataSource every time you want to run a query.

For MySQL this is going to be a MysqlDataSource. For Postgres start with PGSimpleDataSource. For SQLite, SQLiteDataSource.

The exact .set* methods you need to call will be different depending on your db and maybe your deployment situation.

import javax.sql.DataSource;

import com.mysql.cj.jdbc.MysqlDataSource;

class Main {
    public static void main(String[] args) {
        MysqlDataSource db = new MysqlDataSource();
        db.setPort(3306);
        db.setUser("username");
        db.setPassword("password");
    }
}

Get a Connection

Once you have a DataSource you can call the getConnection method to get an active connection to the database.

import com.mysql.cj.jdbc.MysqlDataSource;

import java.sql.Connection;
import javax.sql.DataSource;

class Main {
    public static void main(String[] args) {
        MysqlDataSource db = new MysqlDataSource();
        db.setPort(3306);
        db.setUser("username");
        db.setPassword("password");

        try (Connection conn = db.getConnection()) {
            
        }
    }
}

You will notice that I put the connection inside a try( ... ) {} thing. This is called a try-with-resources and all it does is make sure to call conn.close() after the block is exited, even if an exception happens. Since you want to generally close a connection when you are done with it, this is the way to go.

The alternative is this, which you might have seen on your teacher's slides and the example code you were given. This hasn't been needed since 2011.

Connection conn = null;
try {
    conn = db.getConnection();
    // Code that might crash
}
finally {
    if (conn != null) {
        conn.close();
    }
}

While you can re-use connections, I have to ask that you do not store any Connection objects in fields. Whenever you need a connection object, get a fresh one from the DataSource. This might sound inefficient, but trust me its better than the alternatives.

Create a PreparedStatement

There are other ways to run queries on your database, but this is the most consistent one.

On a connection object you can call a method named prepareStatement and give it a String containing a SQL Query. This PreparedStatement object also should be set up to automatically close like a Connection.

import com.mysql.cj.jdbc.MysqlDataSource;

import java.sql.Connection;
import java.sql.PreparedStatement;
import javax.sql.DataSource;

class Main {
    public static void main(String[] args) {
        MysqlDataSource db = new MysqlDataSource();
        db.setPort(3306);
        db.setUser("username");
        db.setPassword("password");

        try (Connection conn = db.getConnection()) {
            try (PreparedStatement stmt = conn.prepareStatement(
                    "SELECT 1 as number;"
            )) {
                
            }
        }
    }
}

Get a ResultSet

To execute a SQL query that will give you results, you call executeQuery on a PreparedStatement.

This gives you an object called a ResultSet. A ResultSet represents a "cursor" over all the rows that came as results from your queries.

It starts before any rows in the query and each time you call next it moves to the next row. Once you are at a particular row, you call various .get* methods to access the data in that row.

import com.mysql.cj.jdbc.MysqlDataSource;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import javax.sql.DataSource;

class Main {
    public static void main(String[] args) {
        MysqlDataSource db = new MysqlDataSource();
        db.setPort(3306);
        db.setUser("username");
        db.setPassword("password");

        try (Connection conn = db.getConnection()) {
            try (PreparedStatement stmt = conn.prepareStatement(
                    "SELECT 1 as number;"
            )) {
                ResultSet rs = stmt.executeQuery();
                rs.next();
                System.out.println(rs.getInt("number"));
            }
        }
    }
}

If you select more than one row, you can use the fact that rs.next() returns false when there are no more rows to loop through them all.

import com.mysql.cj.jdbc.MysqlDataSource;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import javax.sql.DataSource;

class Main {
    public static void main(String[] args) {
        MysqlDataSource db = new MysqlDataSource();
        db.setPort(3306);
        db.setUser("username");
        db.setPassword("password");

        try (Connection conn = db.getConnection()) {
            try (PreparedStatement stmt = conn.prepareStatement(
                    "SELECT name FROM person;"
            )) {
                ResultSet rs = stmt.executeQuery();
                while (rs.next()) {
                    System.out.println(rs.getString("name"));
                }
            }
        }
    }
}

And if you are unsure if you will even get one row, you can use that fact in a similar way.

import com.mysql.cj.jdbc.MysqlDataSource;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import javax.sql.DataSource;

class Main {
    public static void main(String[] args) {
        MysqlDataSource db = new MysqlDataSource();
        db.setPort(3306);
        db.setUser("username");
        db.setPassword("password");

        try (Connection conn = db.getConnection()) {
            try (PreparedStatement stmt = conn.prepareStatement(
                    "SELECT name FROM person WHERE ssn='111111111';"
            )) {
                ResultSet rs = stmt.executeQuery();
                if (rs.next()) {
                    System.out.println(rs.getString("name"));
                }
                else {
                    System.out.println("No matching person");
                }
            }
        }
    }
}

Set parameters

The queries you want to run will involve data that comes from a user typing stuff into a box. The way to deal with this is not. I repeat not, under any circumstances the following.

"SELECT name FROM person WHERE birthday='" + birthday + "'";

This is the root cause of SQL Injection and is generally not something you want to ever do.

The way to include data in a query is to put a ? in the places that data should go, then call various .set* methods to set the data. You pass them the data and then the ? you are replacing. These start counting from 1, which is unique.

import com.mysql.cj.jdbc.MysqlDataSource;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import javax.sql.DataSource;

class Main {
    public static void main(String[] args) {
        MysqlDataSource db = new MysqlDataSource();
        db.setPort(3306);
        db.setUser("username");
        db.setPassword("password");

        try (Connection conn = db.getConnection()) {
            try (PreparedStatement stmt = conn.prepareStatement(
                    "SELECT name FROM person WHERE birthday=?"
            )) {
                stmt.setString(1, "9/9/1999");
                ResultSet rs = stmt.executeQuery();
                while (rs.next()) {
                    System.out.println(rs.getString("name"));
                }
            }
        }
    }
}

Use Multi-Line Strings

As your queries get bigger, they will probably be going on multiple lines. To do this use three double quotes on either side.

"""
SELECT name FROM person
WHERE birthday=?
"""

It's important to know this because your maybe very old curriculum will still have examples like

"SELECT name FROM person \n" +
    "WHERE birthday=?"

Which can get tedious.

Execute non-queries

To do something that isn't a query, like inserting rows, you use the .execute() method instead of .executeQuery(). This will not give you a ResultSet object.

import com.mysql.cj.jdbc.MysqlDataSource;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import javax.sql.DataSource;

class Main {
    public static void main(String[] args) {
        MysqlDataSource db = new MysqlDataSource();
        db.setPort(3306);
        db.setUser("username");
        db.setPassword("password");

        try (Connection conn = db.getConnection()) {
            try (PreparedStatement stmt = conn.prepareStatement(
                    """
                    INSERT INTO person(name, status)
                    VALUES (?, ?)
                    """
            )) {
                stmt.setString(1, "tiny tim");
                stmt.setString(2, "not dead");
                stmt.execute();
            }
        }
    }
}

Pool your connections

Getting a fresh connection to the database every time you want to make a query is ultimately inefficient.

Think of getting a connection like making a phone call. You need to dial, it needs to ring, and the other end needs to pick up. That all takes time.

To resolve this we use "Connection Pools." These are DataSource implementations which keep some number of connections always active and re-use them between calls to .getConnection,

The library to use for this is called HikariCP.

import com.mysql.cj.jdbc.MysqlDataSource;
import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;

public class Main {
    public static void main(String[] args) {
        MysqlDataSource mysql = new MysqlDataSource();
        mysql.setPort(3306);
        mysql.setUser("username");
        mysql.setPassword("password");
        
        HikariConfig config = new HikariConfig();
        config.setDataSource(mysql);

        HikariDataSource db = new HikariDataSource(config);
        
        try (var conn = db.getConnection()) {
            // ...
        }
    }
}

Note that you do not need to pool connections with a database like SQLite. There making a connection isn't like making a phone call, it's like shouting at your cousin in the other room. There's only one cousin and he can hear you.


<- Index

The Java Command Line Workflow

by: Ethan McCue

A while ago, I released a draft of the jresolve command line tool. Its function is to take a set of root dependency declarations and resolve the full set of transitive dependencies.

I'm happy with the API, but there are some things to fix up.

But I think jresolve's existence, and why I bothered to make it, only makes sense as a part of a larger story. This is an attempt to tell it.

The Problem

We all use Maven or Gradle. There are other up-and-coming build tools like bld and some Ant holdovers from the 2000s, but if you threw darts at Java codebases that is what you would hit.

This is a good state of affairs in many ways, but there are downsides.

The specific downside I want to focus on is how it affects the way people learn Java. What follows are my own opinions and perception.

Step 1.

When people learn how to code, typically they start with a "Hello, world" program.

In the past, this part involved hand-waving away public static void main(String[] args). In some future release of Java it will be simpler. That's great. I'll talk about how can affect curriculums at some point.

But from a tooling perspective, this step is a choice between having them run java Main.java on the command line and having them click the "Big Green Run Button" in whatever text editor they installed or online platform they signed up for.

Step 2.

You can actually go pretty far into the language without leaving a single file, but at some point a student needs to have more than one file in their projects.

To do this, you again have a (non-exclusive) choice of how to approach it. Either the command line or the Big Green Run Button.

If you take the command line route, it will be still java Main.java, followed by java src/Main.java after you guide them to keep their code in a folder.

Green button is the green button.

Step 3.

Because it will be relevant to things to come, you at some point want to explain that Java code can be compiled ahead of time to .class files. You could point to the directory where the B.G.R.B. put the class files, or you could explain what javac is and how to use it. For that you would land on something like this.

javac --source-path ./src -d classes src/Main.java
java --class-path classes Main

So you would have been able to introduce javac, the concept of ahead of time compilation, .class files, and the --class-path.

Step 4.

Once they've made an app the next thing they'll want is to package it up into a jar.

In the 🟢 world, you show them a menu in their editor and what buttons to click.

In the CLI, it's a chance to show them how to use the jar tool.

jar --create --file app.jar --main-class Main -C classes .

Step 5.

This is where things get tricky, because once they know how to build an app it won't be long before they want to make something that requires a dependency.

And it is this step where things fall apart.

If you are lucky, the dependency they need has no transitive dependencies. You show them how to download a .jar file, how to add it to the IDE or where to put it on the --class-path, and warn them that they won't be able to get way with that forever.

If you aren't, you need Maven or Gradle. That is by far the easiest way to make sure they get their dependencies.

It is also easy to justify. Chances are any Java job would use one of those.

One problem is that because Maven and Gradle also take over compiling the code, you invalidate their investment in learning how to use javac and jar. They won't be using either of those from now on.

Another is that both are going to throw a lot in their face. Either an entirely new programming language with Gradle or a relatively beefy pom.xml with Maven.

This is what a "blank" Maven gives you in IntelliJ. It's not horrible, but it does have some public static void main(String[] args)-like properties.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>untitled92</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>21</maven.compiler.source>
        <maven.compiler.target>21</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

</project>

You could also use something like jbang, which automatically downloads dependencies declared as comments in the code. But this stops being viable if you want those dependencies for things other than running the code as a script, unfortunately.

So the point of jresolve specifically is to let you stick with the command-line flow and avoid pivoting to build tools until later.

You show them how to use it to get dependencies from the internet as well as how to use those dependencies.

jresolve \
    --output-directory libraries \
    pkg:maven/de.gurkenlabs/litiengine@0.8.0

javac \
    --module-path libraries \
    --add-modules ALL-MODULE-PATH \
    --source-path ./src \
    -d classes \
    src/Main.java

java \
    --module-path libraries \
    --add-modules ALL-MODULE-PATH \
    --class-path classes \
    Main

As an aside, while it requires the unpleasant appearance of ALL-MODULE-PATH I would argue that showing early that you should put your external dependencies on the --module-path is a good thing.

Step 6.

If you took the path enabled by a tool like jresolve, the commands you are asking folks to run likely are getting pretty hard to remember.

It is a good time to introduce some "command runner" mechanism. Some way so they only have to say compile instead of a long javac incantation.

For this purpose, I have a liking for just, but shell scripts, makefiles, etc. are all valid.

help:
    just --list
    
clean:
    rm -rf classes
    rm -rf libraries

install:
    rm -rf libraries
    jresolve \
        --output-directory libraries \
        pkg:maven/de.gurkenlabs/litiengine@0.8.0
   
compile:
    rm -rf classes
    javac \
        --module-path libraries \
        --add-modules ALL-MODULE-PATH \
        --source-path ./src \
        -d classes \
        src/Main.java

run:
    java \
        --module-path libraries \
        --add-modules ALL-MODULE-PATH \
        --class-path classes \
        Main  

Now that they can do something spiritually like just compile, commands no longer need to be produced from memory every time they want to do things with their code.

It also instills the notion that software projects are generally built by a set of named and repeatable processes.

And if you haven't or its just time for a refresher, you can use this as an opportunity to go a little deeper into the command line and explain tools like cd and rm.

Step 7.

Now that they know how to use dependencies and can run somewhat involved processes in the CLI, you can show them how to package their code to share with someone who doesn't have Java.

If they've made games and other such gui things, then you can show them jpackage. Have them make a jar with their classes and show them the flags to include their dependencies and make an installer.

If you've kept more of a server-y focus, maybe you'd just show them how to copy files to a remote machine, maybe you'd go through something like docker and show them the commands to build images.

But at this point they have some idea of how to "ship" their code.

Step 8.

Now that they have all the mechanisms to deliver code from concept to product, they are going to start making big projects.

At least some students will have an idea for a game or a website or similar they're going to invest a lot of time in, but also the assignments you are giving will probably require more structure. A thing I've seen in a lot of curriculums is to have everyone do the M part of an MVC type assignment and swap Ms with another group and write the V and C using that.

As such, it is as good a time as any to introduce modules.

The path of least resistance would be to introduce the multi-module format that javac understands. I.E you have a top level directory for each module that has the module's name.

some.mod/
    src/
        module-info.java
        some/
            mod/
                ...
other.mod/
    src/
        module-info.java
        other/
            mod/
                ...
javac \
    -d compiled \
    --module-path libraries \
    --module-source-path  "./*/src" \
    --module some.mod,other.mod

That will require talking about visibility and packages, so it's a good point to also start talking about higher level concepts like encapsulation and library contracts.

Step 9.

Maybe this can be done a bit sooner, but you definitely need to show those goobers how to write unit tests now.

The best way to do this is to show them how to use junit.

java --module-path libraries:compiled \
     --add-modules ALL-MODULE-PATH \
     org.junit.platform.console.ConsoleLauncher execute --scan-modules

Maybe there can be a junit executable ready via some mechanism, but either way all the mechanics of even this relatively verbose incantation have been shown.

And this is a great point to introduce the practice of having a separate test folder. Also potentially resources since tests can use those as a source of test data.

Step 10.

At some point now that they know how to write code, write tests, design modules, etc. It would be a good point to get into library writing. Not everyone will, but some will try their hand.

It's at this point that learning Maven or Gradle probably becomes needed, though I think with a smidgen more tooling that can be delayed. Maybe just something to generate a POM + jreleaser would be enough.

Step N.

Then, at some point, they have need for a real build tool. I won't opine on this, but I think some people wouldn't ever reach this step.

They will have already gotten a relatively deep understanding of the underlying tools, naturally come across the concept of a build task, know what a library is and what Maven coordinates are and do, and they will have enough context to know why Maven would choose src/main/java as the place to put code.

I think this is a healthier level to engage with build-tools at. Understanding what tasks they automate because you've done those tasks manually.

It also gives a firmer foundation for the more exotic parts of tooling like agents, AppCDS, annotation processors, etc. Build tools aren't always the most intuitive with those.

Conclusion

This might not be appropriate for all curriculums. Sometimes you are in a boot-camp and you just gotta be employable with Spring in 6 months.

But when the goal of an education isn't optimizing time to employment, I think teaching with the command-line first has value. I just think that in order for it to be actually practical, a few more pieces need to be in place.

So that aspiration is what jresolve is for. It is what whatever CLI tool I make next will probably be for. That's the vision. Have the JDK be enough, by itself, to get bootstrapped into modern software development. Lower the barrier of entry to be around that of JavaScript and Python.


Tell me what I got wrong in the comments below.


<- Index

org.xerial.sqlitejdbc

What is it

org.xerial.sqlitejdbc lets you create and interact with a SQLite database from Java.

Why use it

To quote the blurb on the SQLite website

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. SQLite is the most used database engine in the world. SQLite is built into all mobile phones and most computers and comes bundled inside countless other applications that people use every day.

The SQLite file format is stable, cross-platform, and backwards compatible and the developers pledge to keep it that way through the year 2050. SQLite database files are commonly used as containers to transfer rich content between systems and as a long-term archival format for data. There are over 1 trillion (1e12) SQLite databases in active use.

SQLite source code is in the public-domain and is free to everyone to use for any purpose.

So if you want a data store, and it's okay if that data is in a file on the filesystem, SQLite is a very good choice.

Getting Started

import org.sqlite.SQLiteDataSource;

import java.util.List;

void main() throws Exception {
    var db = new SQLiteDataSource();
    db.setUrl("jdbc:sqlite:database.db");

    try (var conn = db.getConnection();
         var stmt = conn.prepareStatement("""
                 CREATE TABLE IF NOT EXISTS widget(
                    id integer not null primary key,
                    name text not null
                 )
                 """)) {
        stmt.execute();
    }

    try (var conn = db.getConnection()) {
        for (var name : List.of("Bob", "Susan", "Sob", "Busan")) {
            try (var stmt = conn.prepareStatement("""
                 INSERT INTO widget(name) VALUES (?)
                 """)) {
                stmt.setString(1, name);
                stmt.execute();
            }
        }
    }

    // id=1, name=Bob
    // id=2, name=Susan
    // id=3, name=Sob
    // id=4, name=Busan
    try (var conn = db.getConnection();
         var stmt = conn.prepareStatement("""
                 SELECT id, name
                 FROM widget
                 """)) {
        var rs = stmt.executeQuery();
        while (rs.next()) {
            System.out.println(
                    STR."id=\{rs.getInt("id")}, name=\{rs.getString("name")}"
            );
        }
    }
}

<- Index

de.poiu.apron

What is it

de.poiu.apron gives you the ability to read and write properties files while preserving comments, whitespace, and order of entries.

Why use it

java.util.Properties is one of the simpler ways to add configuration to a project. Properties files are just key value pairs separated by an equals sign.

key=value
other=otherValue

But if you want to edit a properties file programmatically while keeping any formatting, ordering, or commenting that a human did manually you will run into trouble.

This is the niche that de.poiu.apron fills. You can have configuration files which are updated by a program and a human interchangeably.

Getting Started

import de.poiu.apron.PropertyFile;
import de.poiu.apron.entry.PropertyEntry;

import java.nio.file.Files;
import java.nio.file.Path;

void main() throws Exception {
    var path = Path.of("config.properties");
    var fileContents = """
            key=value
            # Context here
            otherKey=otherValue
            """;
    Files.writeString(path, fileContents);

    PropertyFile file = PropertyFile.from(path.toFile());

    // value
    System.out.println(file.get("key"));
    file.appendEntry(new PropertyEntry("port", "4031"));

    file.saveTo(path.toFile());

    // key=value
    // # Context here
    // otherKey=otherValue
    // port = 4031
    System.out.println(Files.readString(path));
}

<- Index

com.ethlo.time

What is it

com.ethlo.time provides utilities for parsing and producing the date and time formats that you are likely to run into on the internet. Namely, RFC-3339 timestamps and the W3C date and time Formats.

Why use it

While the java.time packages provide generic date and time parsing and can support a wide variety of formats, you still need to know what formats to pick. You also need to pick the same ones everywhere in your program.

This streamlines that process for the common case of working with time information you got from, or you want to put out into, the internet.

It also is reportedly faster than the code you would produce using the generic APIs.

Getting Started

import java.time.OffsetDateTime;
import com.ethlo.time.DateTime;
import com.ethlo.time.ITU;

void main() {
    DateTime dateTime
            = ITU.parseLenient("2012-12-27T19:07Z");
    // 2012-12-27T19:07Z
    System.out.println(dateTime);

    OffsetDateTime offsetDateTime
            = ITU.parseDateTime("2012-12-27T19:07:22.123456789-03:00");

    // 2012-12-27T22:07:22Z
    System.out.println(ITU.formatUtc(offsetDateTime));

    // 2012-12-27T22:07:22.123Z
    System.out.println(ITU.formatUtcMilli(offsetDateTime));

    // 2012-12-27T22:07:22.123456Z
    System.out.println(ITU.formatUtcMicro(offsetDateTime));

    // 2012-12-27T22:07:22.123456789Z
    System.out.println(ITU.formatUtcNano(offsetDateTime));
}

<- Index

com.fasterxml.uuid

What is it

com.fasterxml.uuid has methods to generate, and customize the generation of, UUIDs.

Why use it

Most of the time, folks use UUID.randomUUID() to get their universally unique identifiers. That makes a UUIDv4.

But the world of UUIDs is more varied than that and there are different kinds of UUIDs that you might want to use.

This includes UUIDv6 and UUIDv7, which aren't referenced in the above link.

Fun fact though, this library predates the addition of UUID.randomUUID() to the standard library.

Getting Started

import com.fasterxml.uuid.Generators;

void main() {
    var uuidv7 = Generators
            .timeBasedEpochGenerator().generate(); // Version 7

    System.out.println(uuidv7);

    var uuidv5 = Generators
            .nameBasedGenerator()
            .generate("string to hash");

    System.out.println(uuidv5);
}

<- Index

dev.mccue.microhttp.session

What is it

dev.mccue.microhttp.session provides an interface for encoding session data in microhttp responses and decoding session data from microhttp requests.

Last one from me for this series, I promise. This just took a lot of build up.

Why use it

If you are making a classical web app, and maybe you should, then you will want to store persistent data about your users.

Most often logins, but other things like flash data are also fair game.

This provides a composable interface to that capability.

Getting Started

This example uses ScopedValues so will require preview features.

import dev.mccue.json.JsonDecoder;
import dev.mccue.microhttp.handler.DelegatingHandler;
import dev.mccue.microhttp.handler.RouteHandler;
import dev.mccue.microhttp.html.HtmlResponse;
import dev.mccue.microhttp.session.ScopedSession;
import dev.mccue.microhttp.session.SessionManager;
import dev.mccue.microhttp.session.SessionStore;
import org.microhttp.EventLoop;
import org.microhttp.Options;

import java.util.List;
import java.util.regex.Pattern;

import static dev.mccue.html.Html.HTML;


void main() throws Exception {
    var indexHandler = RouteHandler.of(
            "GET",
            Pattern.compile("/"),
            request -> {
                var name = ScopedSession.get()
                        .get("name", JsonDecoder::string)
                        .orElse("?");

                return new HtmlResponse(HTML."""
                        <h1> Your name is \{name} </h1>
                        """);
            }
    );

    var nameHandler = RouteHandler.of(
            "GET",
            Pattern.compile("/name/(?<name>.+)"),
            (matcher, request) -> {
                ScopedSession.update(data ->
                        data.with("name", matcher.group("name")));
                return new HtmlResponse(HTML."Go back to /");
            }
    );


    var notFound = new HtmlResponse(404, HTML."Not Found");
    var error = new HtmlResponse(500, HTML."Internal Server Error");

    // Can also store in encrypted cookies
    var store = SessionStore.inMemory();
    var manager = SessionManager.builder()
            .store(store)
            .build();

    var rootHandler = ScopedSession.wrap(manager,
            new DelegatingHandler(List.of(indexHandler, nameHandler), notFound)
    );
    
    var eventLoop = new EventLoop((request, callback) -> {
        try {
            callback.accept(rootHandler.handle(request).intoResponse());
        } catch (Exception e) {
            callback.accept(error.intoResponse());
        }
    });

    eventLoop.start();
    eventLoop.join();
}

<- Index

dev.mccue.async

What is it

dev.mccue.async provides one class - Atom. Atom wraps an AtomicReference and gives a simpler, if less powerful, API that is geared around atomic compare and swap operations.

Why use it

If you are from the Clojure world, this gives an API directly inspired by its atom construct. That can be appealing if you want to have managed immutable state and are used to that world.

The primary utility provided is having the atomic compare and swap logic already written out for you. It's only a handful of lines, but not something appealing to copy around a codebase.

Getting Started

import java.util.ArrayList;
import dev.mccue.async.Atom;

void main() throws Exception {
    var data = Atom.of(0);
    
    // 0
    System.out.println(data.get());
    
    data.swap(x -> x + 1);
    
    // 1
    System.out.println(data.get());
    
    // A bunch of concurrent swaps is sorta a worse
    // case situation for an atomic reference perf.
    // wise, but a good illustration of correctness.
    
    var threads = new ArrayList<Thread>();
    for (int i = 0; i < (10000 - 1); i++) {
        threads.add(
            Thread.startVirtualThread(() -> data.swap(x -> x + 1))
        );
    }
    
    for (var thread : threads) {
        thread.join();
    }
    
    // 10000
    System.out.println(data.get());
}

<- Index

dev.mccue.microhttp.json

What is it

dev.mccue.microhttp.json provides JsonResponse, a class which implements IntoResponse and thus can be used alongside microhttp and microhttp-handler to produce responses which contain html.

It automatically adds the appropriate Content-Type header, determines the HTTP reason phrase with reasonphrase, and accepts the Json type provided by dev.mccue.json.

Why use it

If you are using microhttp with microhttp-handler, it boxes up the logic needed in order to return json responses. This would otherwise be cumbersome to write at every needed location

Getting Started

import dev.mccue.microhttp.handler.RouteHandler;
import dev.mccue.microhttp.json.JsonResponse;

import java.util.regex.Pattern;
import java.util.regex.Matcher;


class BasicHandler extends RouteHandler {
    IndexHandler() {
        super("GET", Pattern.compile("/"));
    }

    @Override
    public JsonResponse handleRoute(
            Matcher matcher,
            Request request
    ) {
        return new JsonResponse(
                Json.objectBuilder()
                    .put("name", "bob")
                    .build()
        );
    }
}

<- Index

dev.mccue.json

What is it

dev.mccue.json provides the ability to read and write JSON data as well as to encode data into JSON and decode data from JSON.

Why use it

Most popular JSON libraries use data-binding. You make a class, possibly annotate it a little bit, and then some automatic logic binds the data inside a JSON structure to the fields of your class.

This has clear upsides, but it's hard to explain to newcomers. The underlying mechanism of data-binding is either reflection or compile-time code generation. Both are processes that are hard to "touch." The escape hatches in libraries that assume data-binding is the default mode of operation are less than ergonomic.

This takes the other approach, making people manually extract data from JSON, but does so in a way that is composable. You have to write more code, but the mechanics of the code are more plain to see.

I've written about this before. I'm listing it here because some libraries I wrote that I'll introduce later depend on it.

Getting Started

import dev.mccue.json.Json;
import dev.mccue.json.JsonDecoder;
import dev.mccue.json.JsonEncodable;

record Superhero(String name) implements JsonEncodable {
    @Override
    public Json toJson() {
        return Json.objectBuilder()
            .put("name", name)
            .build();
    }
    
    public static Superhero fromJson(Json json) {
        return new Superhero(
            JsonDecoder.field(json, "name", JsonDecoder::string)
        );
    }
}

void main() {
    var superhero = new Superhero("superman");
    
    var json = superhero.toJson();
    var jsonStr = Json.writeString(json);
    
    var roundTripped = Json.readString(jsonStr);
    var newSuperhero = Superhero.fromJson(roundTripped);
    
    System.out.println(superhero);
    System.out.println(jsonStr);
    System.out.println(newSuperhero);
}

<- Index

com.samskivert.jmustache

What is it

com.samskivert.jmustache is an implementation of the Mustache templating language.

Why use it

Mustache is one of many templating languages used for generating HTML. It has the unique advantage of being especially portable between languages and environments.

This comes as a result of a deliberate choice to not allow much "logic" in templates. To quote its man page:

We call it "logic-less" because there are no if statements, else clauses, or for loops. Instead there are only tags. Some tags are replaced with a value, some nothing, and others a series of values.

This comes at a cost - you need to arrange all the information for a template up-front - but does make it easier to consider the behavior of a template in isolation.

Of the implementations of mustache available for the JVM, com.samskivert.jmustache has the fewest moving pieces, is up-to-date with the specification, and is faster than other implementations that do their work at runtime.

Getting Started

import com.samskivert.mustache.Mustache;

import java.util.List;
import java.util.Map;

record Cartoon(String name, boolean hasMovie) {}

void main() {
    var cartoons = List.of(
            new Cartoon("Space Ghost Coast to Coast", false),
            new Cartoon("Harvey Birdman, Attorney at Law", false),
            new Cartoon("Sealab 2021", false),
            new Cartoon("The Venture Bros.", true)
    );

    var template = """
            <html>
              <body>
                <h1> Cartoons </h1>
                <ul>
                {{#cartoons}}
                  <li>
                    {{name}}
                    {{#hasMovie}}
                      (there is a movie)
                    {{/hasMovie}}
                  </li>
                {{/cartoons}}
                </ul>
              </body>
            </html>
            """;

    var compiledTemplate = Mustache.compiler()
            .compile(template);

    var renderedTemplate = compiledTemplate.execute(
            Map.of("cartoons", cartoons)
    );

    // <html>
    //  <body>
    //    <h1> Cartoons </h1>
    //    <ul>
    //      <li>
    //        Space Ghost Coast to Coast
    //      </li>
    //      <li>
    //        Harvey Birdman, Attorney at Law
    //      </li>
    //      <li>
    //        Sealab 2021
    //      </li>
    //      <li>
    //        The Venture Bros.
    //          (there is a movie)
    //      </li>
    //    </ul>
    //  </body>
    //</html>
    System.out.println(renderedTemplate);
}

<- Index

com.nulabinc.zxcvbn

What is it

com.nulabinc.zxcvbn, so named after one of the 100 most common passwords, is a password strength estimator.

Why use it

People aren't very good at picking passwords. While it is technically their fault if they make their password 123456 and get their bank account stolen, that can very quickly become your problem.

Some services try to mitigate this by asking that passwords have letters, numbers, and "special characters" in them. This doesn't stop things like P@ssW1rd!, which will be guessed by password crackers in under a millisecond.

com.nulabinc.zxcvbn will instead try to figure out how easy it will be for a password cracker to guess the password. This will lead to your users having generally stronger passwords.

Getting Started

import com.nulabinc.zxcvbn.WipeableString;
import com.nulabinc.zxcvbn.Zxcvbn;

void main() {
    var zxcvbn = new Zxcvbn();

    // Pro-tip, storing passwords in mutable structures lets
    // you lower the time they are floating around in program
    // memory. This decreases the window of opportunity for
    // attackers that might have found a way to poke around
    // in your process.
    //
    // If that sort of attack isn't in your threat model, you
    // can use regular Strings.
    var password = new WipeableString("P@ssw0rd!");

    var strength = zxcvbn.measure(password);

    var warning = strength.getFeedback()
            .getWarning();
    
    // This is similar to a commonly used password.
    System.out.println(warning);

    var suggestions = strength.getFeedback()
            .getSuggestions();
    
    // Add another word or two. Uncommon words are better.
    // Capitalization doesn't help very much.
    // Predictable substitutions like '@' instead of 'a' don't help very much.
    System.out.println(String.join("\n", suggestions));

    // fair
    switch (strength.getScore()) {
        case 0 -> System.out.println("weak");
        case 1 -> System.out.println("fair");
        case 2 -> System.out.println("good");
        case 3 -> System.out.println("strong");
        default -> System.out.println("very strong");
    }
}

<- Index

dev.mccue.microhttp.cookies

What is it

dev.mccue.microhttp.cookies provides a utility parsing cookie headers sent in requests, specifically microhttp's Request objects.

Why use it

If you've asked a user to send you a cookie on subsequent requests, such as with dev.mccue.microhttp.setcookie, you will most likely want to interpret the data in that cookie when you get it.

This library provides the ability to do that.

Getting Started

import dev.mccue.microhttp.cookies.Cookies;
import dev.mccue.microhttp.setcookie.SetCookieHeader;
import org.microhttp.EventLoop;
import org.microhttp.Options;
import org.microhttp.Response;

import java.util.List;

void main() throws Exception {
    var eventLoop = new EventLoop((request, consumer) -> {
        var cookies = Cookies.parse(request);
        var counter = cookies.get("Counter")
                .orElse("0");
        
        var setCookieHeader = SetCookieHeader.of(
                "Counter",
                Integer.toString(Integer.parseInt(counter) + 1)
        );

        consumer.accept(
                new Response(
                        200,
                        "OK",
                        List.of(setCookieHeader),
                        counter.getBytes()
                )
        );
    });

    eventLoop.start();
    eventLoop.join();
}

<- Index

dev.mccue.microhttp.setcookie

What is it

dev.mccue.microhttp.setcookie provides a utility for generating a Set-Cookie header for use in a microhttp Response.

Why use it

Whenever a web browser receives a response from a website, depending on user settings, it will look for any SetCookie headers in that response. Data conveyed in those headers will be sent back to the server with every subsequent request.

This is one of the easiest ways to have persistent state, like user sessions, on a website.

Getting Started

import org.microhttp.Header;

import dev.mccue.microhttp.setcookie.SameSite;
import dev.mccue.microhttp.setcookie.SetCookieHeader;

void main() {
    Header header = SetCookieHeader.of("name", "value");
    
    // Header[name=Set-Cookie, value=name=value]
    System.out.println(header);

    Header otherHeader = SetCookieHeader.builder("name2", "value2")
            .sameSite(SameSite.STRICT)
            .secure(true)
            .build();
    
    // Header[name=Set-Cookie, value=name2=value2; SameSite=Strict; Secure]
    System.out.println(otherHeader);
}

<- Index

com.sanctionco.jmail

What is it

com.sanctionco.jmail parses and validates email addresses.

Why use it

Web applications often need to work with email addresses in some form.

If you find yourself needing to check if something is a valid email address, jmail is more correct than the alternatives and generally around twice as fast..

If you find yourself wanting to represent an email address in your domain, the Email type provided by this library will serve you well. You'd want to use that over a String for the same reason you'd want to store a path in a Path or an address in a URI.

Getting Started

import com.sanctionco.jmail.Email;
import com.sanctionco.jmail.JMail;

void main() {
    // false
    System.out.println(JMail.isValid("gibberish"));

    // true
    System.out.println(JMail.isValid("apple@example.com"));

    Email email = Email.of("apple@example.com")
            .orElseThrow();

    record User(Email email) {
    }

    User user = new User(email);

    // User[email=apple@example.com]
    System.out.println(user);

    // example.com
    System.out.println(user.email().domain());
}

<- Index

org.apiguardian.api

What is it

org.apiguardian.api provides an @API annotation. This gives a structured place to document API stability guarantees.

Why use it

If you are writing an application, generally no API is truly stable. You are free to change whatever you need to in order to make the software work.

Libraries are different. Libraries are used by people whose code you have no control over, but with whom you form an implicit social contract where they trust you to not break their code with new library releases.

Explicitly documenting which elements of an API you are committed to maintaining, which you are experimenting with, and which they really shouldn't be touching is therefore a useful thing to do.

Annotations will show up prominently in generated documentation, which makes them a good mechanism for documenting these guarantees (or lack there-of).

Getting Started

import org.apiguardian.api.API;

public final class MathOps {
    private MathOps() {}

    @API(status = API.Status.STABLE)
    public double pi() {
        return 3.14;
    }

    @API(status = API.Status.EXPERIMENTAL)
    public double tau() {
        return pi() * 2;
    }
}

<- Index

dev.mccue.microhttp-html

What is it

microhttp-html provides HtmlResponse, a class which implements IntoResponse and thus can be used alongside microhttp and microhttp-handler to produce responses which contain html.

It automatically adds the appropriate Content-Type header, determines the HTTP reason phrase with reasonphrase, and accepts the Html type provided by html.

Why use it

If you are using microhttp with microhttp-handler, it boxes up the logic needed in order to return html responses. This would otherwise be cumbersome to write at every needed location

Getting Started

At time of writing template processors are a preview-feature, so you will need to use the latest version of the library and the latest JDK.

import dev.mccue.microhttp.handler.RouteHandler;
import dev.mccue.microhttp.html.HtmlResponse;

import java.util.Properties;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

import static dev.mccue.html.Html.HTML;

class IndexHandler extends RouteHandler {
    IndexHandler() {
        super("GET", Pattern.compile("/"));
    }

    @Override
    public HtmlResponse handleRoute(
            Matcher matcher,
            Request request
    ) {
        var name = "bob";
        return new HtmlResponse(HTML."""
                <html>
                  <body>
                    <h1> Hello \{name} </h1>
                  </body>
                </html>
                """);
    }
}

<- Index

dev.mccue.html

What is it

html provides an Html type and a template processor which produces Html and auto-escapes any embedded values.

Why use it

Before template processors, your options for producing html were to

  • Keep the HTML in a template - usually, but not always, in a separate file
  • Generate HTML with a programattic API.

The first option is the most widespread, but means that your logic for filling in the template won't be co-located with the contents of the template itself. It also gives a damp and dimly-lit surface for "template languages" to grow. These are often full programming languages in their own right and require special IDE support.

The second option isn't as popular because you effectively lose the ability to apply the expertise of designers, who are generally familiar with HTML and are used to seeing page layout expressed in HTML. It also puts a lot of pressure on the programmatic API, since you need to make sure every HTML idiom you need to express is expressible. A difficult feat with an evolving standard.

Template processors are a middle ground of sorts.

They are a templating language but, because they will be an official part of Java, you can count on IDE support and will be able to collocate them with the code for filling in values.

They are a programmatic API but, because you write HTML directly, every idiom is expressible. It should be familiar to most designers as well.

You could reasonably draw a parallels between this approach and JSX from the JavaScript world.

Getting Started

At time of writing template processors are a preview-feature, so you will need to use the latest version of the library and the latest JDK.

import java.util.List;
import java.util.ArrayList;

import dev.mccue.html.Html;
import static dev.mccue.html.Html.HTML;

void main() {
    String name = "joe";
    
    var pets = List.of("snoopy", "Yellow Bird");
    
    var petHtml = new ArrayList<Html>();
    for (var pet : pets) {
        petHtml.add(HTML."<li> \{pet} </li>");
    }
    
    var page = HTML."""
        <html>
          <body>
            <h1> Hello \{name} </h1>
            <ul>
              \{petHtml}
            </ul>
          </body>
        </html>
        """;
}

<- Index

org.slf4j.simple

What is it

slf4j-simple is a logging implementation for slf4j-api.

It prints log message emitted at a level of INFO or above to System.err.

Why use it

If any dependency you have uses slf4j-api you will get errors at startup about not having a logging implementation.

slf4j-simple is not flexible or "powerful" by any definition but, depending on how you deploy your application, it might be all you need.

Getting Started

You need to have both slf4j-api and slf4j-simple available to your program, then you should see output from logging statements.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

void main() {
    Logger logger = LoggerFactory.getLogger(getClass());
    logger.info("Hello World");
}

<- Index

org.slf4j

What is it

slf4j-api - "Simple Logging Facade for Java" - is a logging facade. For Java

Why use it

Logging facades let portions of a larger program emit text based logs without needing to know how those logs will be published.

This means external libraries will often emit logs via slf4j and expect that the application they are included in to publish them with a logging implementation.

slf4j-api is the most ubiquitous of these and the winner of the 90s "logging wars."

Getting Started

In order to have the code below emit any output, you need to make sure a logging implementation is included in your project. I'll introduce one of those tomorrow.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

void main() {
    Logger logger = LoggerFactory.getLogger(getClass());
    logger.info("Hello World");
}

<- Index

dev.mccue.microhttp.handler

What is it

microhttp-handler provides interfaces for making composable handlers for microhttp.

There are two interfaces in the module, IntoResponse and Handler.

IntoResponse is something which can be converted into a Response. Handler is a function which takes a Request and returns something which implements IntoResponse.

There are also two implementations of Handler provided for convenience - RouteHandler and DelegatingHandler.

RouteHandler checks the request's method and uri and returns null if it doesn't match a chosen method and regex.

DelegatingHandler tries a list of handlers in order, returning the first non-null response or falling back to a default for when no match is found. Combined with RouteHandler, this can act as a very basic request router.

Why use it

A normal microhttp handler takes a Request and a Consumer<Response> that should be called later.

While this is fine, in the age of virtual threads there isn't much downside to modeling handlers as functions that takes Requests and return Responses and there are many upsides to doing so.

The programming model is simpler to compose, simpler to test, and it provides an opportunity to introduce concepts such as middleware and IntoResponse

IntoResponse is useful because making the normal Response record requires dealing with reason phrases, content-type headers, and body encoding. IntoResponse provides a seam for custom types to box up much of that logic.

Getting Started

import org.microhttp.EventLoop;
import org.microhttp.Response;
import org.microhttp.Header;

import dev.mccue.microhttp.handler.Handler;
import dev.mccue.microhttp.handler.RouteHandler;
import dev.mccue.microhttp.handler.DelegatingHandler;

import dev.mccue.reasonphrase.ReasonPhrase;

import java.util.List;
import java.util.regex.Pattern;

record TextResponse(int status, String value) 
        implements IntoResponse {
    @Override
    public Response intoResponse() {
        return new Response(
                status,
                ReasonPhrase.forStatus(status),
                List.of(new Header("Content-Type", "text/plain")),
                value.getBytes()
        );
    }
}

void main() throws Exception {
    Handler index = RouteHandler.of(
            "GET", 
            Pattern.compile("/"), 
            request -> new TextResponse(200, "Hello, world")
    );

    Handler rootHandler = new DelegatingHandler(
            List.of(index),
            new TextResponse(404, "Not Found")
    );

    var error = new TextResponse(500, "Internal Error");

    var eventLoop = new EventLoop((request, callback) -> {
        Thread.startVirtualThread(() -> {
            try {
                callback.accept(
                        rootHandler.handle(request)
                                .intoResponse()
                );
            } catch (Exception e) {
                callback.accept(error.intoResponse());
            }
        });
    });
    eventLoop.start();
    eventLoop.join();
}

<- Index

org.pcollections

What is it

pcollections provides Persistent Immutable Collections.

These are collections which cannot be modified, but use structural sharing to make creating updated versions of themselves efficient.

Why use it

Persistent collections are useful when you want a data aggregate that is immutable but will require multiple updates over the runtime of the program.

pcollections is unique in the ecosystem of persistent collection libraries in that its types directly extend the Java Collections Framework. Its PVector is a subtype of java.util.List, its PMap is a subtype of java.util.Map, and so on.

There are definite cons to that - having a remove method that does nothing is unideal - but there are also pros. There is no conversion cost when interacting with the numerous APIs that expect java.util.* types.

Getting Started

Basic Usage

import org.pcollections.PVector;
import org.pcollections.TreePVector;

void main() {
    var names = TreePVector.empty();
    var names2 = names.plus("Mumenstallu");
    var names3 = names2.plus("Snufkin");
    
    System.out.println(names);
    System.out.println(names2);
    System.out.println(names3);
}

Many updated versions

import org.pcollections.PVector;
import org.pcollections.TreePVector;

void main() {
    PVector<PVector<Integer>> allVersions = TreePVector.empty();
    PVector<Integer> numbers = TreePVector.empty();
    
    for (int i = 0; i < 10000; i++) {
        numbers = numbers.plus(i);
        allVersions = allVersions.plus(numbers);
    }
    
    System.out.println(allVersions.size());
    
    // Every version is still valid
    System.out.println(allVersions.get(2));
    System.out.println(allVersions.get(4));
    System.out.println(allVersions.get(100));
    
    int total = 0;
    for (int n : numbers) {
        total += n;
    }
    System.out.println(total);
}

<- Index

com.uwyn.urlencoder

What is it

urlparameters provides the logic needed to read and write "URL parameters."

This covers both query parameters, often seen at the end of a URL (google.com?q=apples&track_id=123), and the bodies of html form submissions (name=bob&age=98).

It uses com.uwyn.urlencoder to properly encode parameters for both situations.

Why use it

Most websites eventually encode some information as query parameters inside a URL. urlparameters lets you extract that information as well as produce such URLs.

It is also common for websites to accept information from a user via a form submission - more so if you server-side render HTML. Processing those form submissions means parsing those request bodies.

Getting Started

Parse Query Params from a URL

import java.net.URI;

import dev.mccue.urlparameters.UrlParameters;

void main() {
   var url = URI.create("https://google.com?q=pear");
   var params = UrlParameters.parse(url);
   
   System.out.println(params.firstValue("q").orElseThrow());    
}

(Playground Link)

Parse Form Submission bodies

import dev.mccue.urlparameters.UrlParameters;

void main() {
   var body = "name=jack&title=squire";
   var params = UrlParameters.parse(body);
   
   // squire
   System.out.println(params.firstValue("title").orElseThrow());    
}

Generate a URL with Query Params

import java.util.List;
import java.net.URI;

import dev.mccue.urlparameters.UrlParameters;
import dev.mccue.urlparameters.UrlParameter;

void main() {
   var params = new UrlParameters(List.of(
       new UrlParameter("pokemon", "stantler"),
       new UrlParameter("caught_in", "Pokemon Colosseum")
   ));
   
   var url = URI.create("https://example.com?" + params);
   
   // https://example.com?pokemon=stantler&caught_in=Pokemon%20Colosseum
   System.out.println(url);    
}

<- Index

com.uwyn.urlencoder

What is it

urlencoder encodes URL components using rules determined by combining the unreserved character set from RFC 3986 with the percent-encode set from application/x-www-form-urlencoded.

Why use it

The built-in java.net.URLEncoder encodes strings into the "HTML form encoding."

This is very slightly different from the form of encoding that should be used for URLs. In the specifications for URIs (URLs are a subset of URIs), spaces are encoded as %20. In application/x-www-form-urlencoded, spaces are usually encoded as +, though %20 would also be valid.

Because java.net.URLEncoder uses + for spaces, other libraries can fail to decode data properly.

The UrlEncoder class in this library uses %20 for spaces and is also reportedly more efficient than java.net.URLEncoder.

Getting Started

import com.uwyn.urlencoder.UrlEncoder;

void main() {
    var string = "Hello world";
    var encoded = UrlEncoder.encode(string);
    System.out.println(encoded); // Hello%20world
    var decoded = UrlEncoder.decode(encoded);
    System.out.println(decoded); // Hello world
}

<- Index

dev.mccue.reasonphrase

What is it

reasonphrase is a library that provides a lookup from an HTTP Status Code (like 200) to an HTTP Reason Phrase (like OK).

Reason phrases are a mostly unused part of the HTTP protocol but, if you need to pick one anyway, you might as well pick a standard one.

Why use it

In most situations, your web server will automatically pick a reason phrase based on the status code of a response.

A notable exception to this is Microhttp, which exposes the reason phrase directly in its Response record.

So if you are using Microhttp, or some other minimal server, then this library will be of use.

Getting Started

import org.microhttp.EventLoop;
import org.microhttp.Response;
import org.microhttp.Header;

import dev.mccue.reasonphrase.ReasonPhrase;

void main() throws Exception {
    var eventLoop = new EventLoop((request, callback) -> {
        callback.accept(new Response(
            200,
            ReasonPhrase.forStatus(200), 
            List.of(new Header("Content-Type", "text/plain")),
            "Hello, world".getBytes()
        ));
    });

    eventLoop.start();
    eventLoop.join();
}

<- Index

org.microhttp

What is it

Microhttp is an implementation of an HTTP/1.1 server.

This means that it can handle things like GET and POST requests, but not securing a connection with SSL or websocket connections.

Why use it

It is very fast and, as a result of eschewing support for most convenience features and other protocols, it has a codebase that can be reasonably read and understood fully in a day or two at most.

It discretizes requests and responses, which is a problem if you were expecting to handle file uploads or other such tasks directly, but a non-issue if you only intend to send and receive payloads of reasonable size.

In order to publish to the wider internet, you will need to have SSL. That Microhttp doesn't handle this natively isn't that much of an issue since most platforms as a service like Heroku, Railway, and Render provide this by default. As will any load balancer or properly configured nginix.

Getting Started

import org.microhttp.EventLoop;
import org.microhttp.Response;
import org.microhttp.Header;

void main() throws Exception {
    var eventLoop = new EventLoop((request, callback) -> {
        callback.accept(new Response(
            200, 
            "OK", 
            List.of(new Header("Content-Type", "text/plain")),
            "Hello, world".getBytes()
        ));
    });

    eventLoop.start();
    eventLoop.join();
}

<- Index

Better Java Compiler Error Messages

by: Ethan McCue

This post represents almost a year of work from Andrew Arnold, Ataberk Cirikci, Noah Jamison, and Thalia La Pommeray1.

Every part of this that you agree with, they deserve all the credit for. Every part that you do not can be blamed on me.

Also, this is about Java. We'll take a bit of a winding road to get there, but we will get there.

Background

A compiler's job is to take code - usually from text files - and produce some usable artifact from it. For Java that means taking *.java files and producing *.class files that can be fed into a JVM.2

The first and most important priority of a compiler is to be correct. If the class files produced by the Java compiler do not function in the way specified by the Java Language Specification then it would not be a Java compiler.

The second priority of a compiler has historically been to be fast and resource efficient. In the 90s, CPU and RAM were far more scarce resources. If a language couldn't be compiled efficiently then it would be impractical to use. 3

What has historically not been a focus, and has seen a renaissance in modern times, is error messages.

Elm

Elm is a very small and very focused language. It is built specifically for making frontend web apps and has a very restricted set of features.4

import Html

main = 
    Html.h1 "Hello, world"

It's pretty cool. Check it out if you have some time.

Due to its relatively small surface area5, the language designer was able to dedicate time to the user experience (UX) of its compiler errors.

And by most accounts6, that work has paid off.

When a programmer makes a mistake with their Elm code, the Elm compiler will

  • always have a friendly tone
  • do its best to show the relevant areas of the code
  • provide a hint as to how to resolve the error

Say we took the example from above and tried to use a fictional h7 tag.

import Html

-- There is no h7 tag
main = 
    Html.h7 "Hello, world"

The error you would get is the following.

I cannot find a `Html.h7` variable:

5|     Html.h7 "Hello, world"
       ^^^^^^^
The `Html` module does not expose a `h7` variable. These names seem close
though:

    Html.h1
    Html.h2
    Html.h3
    Html.h4

Hint: Read <https://elm-lang.org/0.19.1/imports> to see how `import`
declarations work in Elm.

First to note is the personification of the compiler as an entity. When it says "I cannot find a variable" it subtly, but importantly, primes the user to think about the compiler as an entity unto itself. Its small stuff like that gives our monkey brains the hooks it needs to anthropomorphize.

And that is useful, because the only time a compiler talks to you is when something is wrong. Which would you prefer

  • "You have cancer"
  • "I have your test results, and unfortunately you have cancer."

The first one is a game over screen in a FromSoft game, the second is a human being with some bedside manner.7

Another aspect about this that is cool is that it points to the exact place in the code that is at issue. Not just the line, but specifically the Html.h7 expression.

5|     Html.h7 "Hello, world"
       ^^^^^^^

Before you can resolve an error, you need to find the code causing it. Seems pretty obvious.

With many compilers you get a location like program.x:43:22 that you have to decipher. Where is that file? Which one is the line? Which is the column? Okay, let me scan through my code. You also often get a pretty-printed version of the problematic code, but it looks nothing like the code you wrote. You again need to do a mental transformation to find it. So a lot of time is lost:

  • converting row and column numbers into an actual file position
  • converting pretty-printed code onto actual code to verify that position

And don't forget that hint! It's a pretty basic analysis8, but being able to suggest functions that the user might have meant is big. Even if we disregard the exact contents of the hint, that there is a dedicated place to give hints and for users to look for hints is great.

It is hard to show in this format9, but in addition to the layout of the message things like the ^^^^^^^ are colored red to draw our attention.

5|     Html.h7 "Hello, world"
       ^^^^^^^

Say we constructed a similar situation in Java.

class Html {
    static Object h1() {
       return null; 
    }
    static Object h2() {
        return null;
    }
    static Object h3() {
        return null;
    }
    static Object h4() {
        return null;
    }
    static Object h5() {
        return null;
    }
    static Object h6() {
        return null;
    }
}

public class Main {
    public static void main(String[] args) {
        System.out.println(Html.h7());
    }
}

The error message we get is streets behind.

/Main.java:24: error: cannot find symbol
        System.out.println(Html.h7());
                               ^
  symbol:   method h7()
  location: class Html
1 error

It is kind of shocking how much better things get when you focus on the user. I mean, on some level, it is not shocking at all though. Most terminal tools came into existence well before our industry really started focusing on making apps and websites feel great for their users. We all collectively realized that a hard to use app or website is bad for business, but the same lessons have not really percolated down to tools like compilers and build tools yet. Hopefully I have demonstrated that we can do better!

Rust

Rust is a systems programming language. That means that it targets the same use-cases as C and C++ where speed and predictable latency are hard requirements.

struct Position {
    x: u32,
    y: u32
}

fn main() {
    let position = Position {
        x: 0,
        y: 1
    };

    println!("x: {}, y: {}", position.x, position.y);
}

Rust's most famous feature is its borrow checker. This is what lets it compete in ergonomics with languages like Python and Java without automatic garbage collection at runtime.10

It does this by tracking the "lifetime" of individual variables and fields, putting some rules in place for when those lifetimes end, and what to do when they end and the variable is "dropped".11

struct Position {
    x: u32,
    y: u32
}

fn main() {
    // Lifetime of the position starts here
    let position = Position {
        x: 0,
        y: 1
    };

    println!("x: {}, y: {}", position.x, position.y);
    
    // At this point the position variable is no longer "alive"
    // and all the memory allocated for it will be freed.
}

The tradeoff here is that the complexity of tracking lifetimes is pushed into the type system. This and other advanced features make Rust one of the most complicated languages out there. "Fighting the Borrow Checker" is a very common occurrence.

Despite this, it is overwhelmingly loved by those who have used it.

My hypothesis for why this is the case12 is it is because very early on its development, there was dedicated focus given to the error messages its compiler produced.13 Even though people tend to produce malformed programs far more often14, the experience of the compiler being "helpful" offset that.15

With the importance of addressing Rust's learning curve a key theme in the Rust survey we're as motivated as ever to find any confusing or distracting part of the Rust experience and give it a healthy amount of polish. Errors are one area where we're applying that polish helps us improve the learning curve bit by bit, and we're looking forward to seeing how far we can go.

All error messages in Rust have a specific structure. There is a place for saying where an error occurred, why it occurred, what the error was, and potentially a hint as to how to resolve it.

For example, this code is malformed because enum variants need to be prefixed with the name of the enum.

enum Ex {
    A,
    B
}

pub fn main() {
    let ex = A;
}

The error message that the compiler produces reflects that.

error[E0425]: cannot find value `A` in this scope
 --> src/main.rs:7:14
  |
7 |     let ex = A;
  |              ^ not found in this scope
  |
help: consider importing this unit variant
  |
1 | use crate::Ex::A;
  |

For more information about this error, try `rustc --explain E0425`.

It says that the problem is that the value A could not be found in scope, shows exactly where in the code the problem is, and offers a hint as to how to resolve it.

Just like the Elm errors there is a dedicated section for giving hints, the exact place in the code where a problem happens is shown, and the message is written in a friendly tone.

error[E0425]: WHAT
 --> WHERE
  |
7 |     let ex = A;
  |              ^ WHY + (arrow gives implicit WHERE)
  |
help: HINT (can have many)
  |
1 | HINT
  |

Compare and contrast that to the error you get with similarly malformed Java code.

import java.util.List;

enum Ex {
    A,
    B
}

public class MyClass {
    public static void main(String args[]) {
        var ex = B;
    }
}
/MyClass.java:10: error: cannot find symbol
        var ex = B;
                 ^
  symbol:   variable B
  location: class MyClass

It still shows where the problem happened, but it offers no assistance for fixing it and uses a fittingly robotic tone.16

Scala

Scala is another language for the JVM like Java. I won't go that deep into an explanation because I am not qualified to do so, but as part of the work on Scala 3 they worked to improve their error messages in similar ways as Elm and Rust

We’ve looked at how other modern languages like Elm and Rust handle compiler warnings and error messages, and come to realize that Dotty is actually in great shape to provide comprehensive and easy to understand error messages in the same spirit.

That work is still ongoing, but the focus was there.

That doesn't mean anything by itself, but I choose to take it as social proof that I'm not crazy.

IDEs

It is tempting to say that it doesn't really matter what errors the compiler spits out because IDEs are in a better position to give feedback anyway.

To an extent, this makes sense. IDEs like IntelliJ are able to provide feedback in ways that a compiler cannot.

  • If something is wrong, they can highlight it in red.
  • If something is questionable, they can highlight it in yellow.
  • If the IDE has a suggestion on how to fix something they can show that through other visual cues.17

That's all great, but unfortunately I don't think it is enough.

It's easy to forget when my M1 Mac is running Baldurs Gate 3 at 60 fps, but hardware powerful enough to run an IDE smoothly is a privilege. IntelliJ cannot run on a chromebook or whatever commodity hardware a chronically underfunded school system can afford.18

This is partly why many curriculums use online platforms like repl.it that are hosted remotely or rely on a student's workflow to be through the command line and a basic text editor.

In those cases especially, compiler errors are front and center in a student's learning.

This is likely going to become more true when the JEPs for Unnamed Classes, Instance Main Methods, and Multi-File Source-Code Programs are integrated and the ergonomics of teaching from the command line become more in line with that of other languages.

And while I can't demonstrate it in as strong a way, I maintain that the error messages matter when you are using an IDE as well. Not everyone sees the red squiggles, some students actively ignore them, and they will see the original compiler message when they try to run their code regardless.

If what they see is vague and unhelpful, that matters.19

Research

There was, to my knowledge, exactly one overview study done on compiler error messages. "Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research".

To those unfamiliar, an overview study is research that looks at existing research within a field and draws conclusions from the body of work in totality. You don't get to make claims like "studies consistently show X, Y, and Z" without looking at all the studies.

There are a few things from that study I think are worthy of note.

First is that there is very little actual research on error messages.

One of our most striking observations was that there was relatively little literature on the effect of programming error messages on students and their learning.

Which at the very least makes me feel better about "going with my gut." Everyone has to be.

Second is that the research that does exist does not produce any strong conclusions

While there have been many guidelines for programming error message design proposed and implemented, no coherent picture has emerged to answer even simple questions such as: What does a good programming error message look like?

But in the summation of the literature there are a few general guidelines that emerged.

  • Increase Readability
  • Reduce Cognitive Load
  • Provide Context
  • Use a Positive Tone
  • Show Examples
  • Show Solutions or Hints
  • Allow Dynamic Interaction
  • Provide Scaffolding
  • Use logical argumentation
  • Report errors at the right time

So while I would love to say "I'm right and science agrees with me"21, the best I can say is that all the properties of Elm and Rust compiler messages that I have noted are at least represented in the list of things that research suggests "might help."

That is

  • Provide Context
  • Use a Positive Tone
  • Show Solutions or Hints

There are individual studies like this one that more directly support my claims, but I've heard of enough horrible Dr. Oz segments like "Chocolate - the new a superfood?!" to know that it's disingenuous to use single studies like that.

So yeah, best I can say is "I am not obviously wrong."

The Structure of javac

The reference compiler for Java is javac. It comes with every OpenJDK build, and it's what most people use to compile their java.25

I will briefly explain some of its internal workings so that you have some context on what we changed and why.

compiler.properties

All the error message text for javac lives inside a set of compiler.properties files. There is one file for each language that has translations. German text is in compiler_de.properties, Japanese in compiler_ja.properties, and so on.

compiler.err.abstract.meth.cant.have.body=\
    abstract methods cannot have a body

Each message is keyed in a way that indicates its purpose. compiler.err.* are for error messages, compiler.warn.* for warnings, and compiler.misc.* for potpourri.

There are comments above messages with placeholders to indicate the type of data that needs to be filled in.

# 0: name
compiler.err.call.must.be.first.stmt.in.ctor=\
    call to {0} must be first statement in constructor

# 0: symbol kind, 1: name, 2: symbol kind, 3: type, 4: message segment
compiler.err.cant.apply.symbol.noargs=\
    {0} {1} in {2} {3} cannot be applied to given types;\n\
    reason: {4}

These files get processed into Java classes by this tooling.

The classes that get generated are subclasses of JCDiagnostic.DiagnosticInfo, where compiler.err.* properties are turned into instances of JCDiagnostic.Error, compiler.warn.* into JCDiagnostic.Warning, and so on.

The class CompilerProperties holds constants for each of these messages as well as static methods for the messages that had those special comments indicating that they need placeholders.

/**
 * compiler.err.anonymous.diamond.method.does.not.override.superclass=\
 *    method does not override or implement a method from a supertype\n\
 *    {0}
 */
public static Error AnonymousDiamondMethodDoesNotOverrideSuperclass(
        Fragment arg0
) {
    return new Error(
        "compiler", 
        "anonymous.diamond.method.does.not.override.superclass", 
        arg0
    );
}

/**
 * compiler.err.array.and.receiver =\
 *    legacy array notation not allowed on receiver parameter
 */
public static final Error ArrayAndReceiver 
        = new Error("compiler", "array.and.receiver");

// And so on

JCDiagnostic

I've been mostly talking about "error messages," but they are ontologically just one kind of "diagnostic".

JCDiagnostic - short for java compiler diagnostic I'm pretty sure - is the representation javac has for diagnostic messages.

It stores references to all the information needed to construct a message shown to the user. This includes the source which the diagnostic references, the position in that source being referenced, as well as other miscellaneous metadata.

A pointer to some text from the compiler.properties as well as the arguments needed for any placeholders in said text are stored in a sub-object under diagnosticInfo. These DiagnosticInfo objects come from what was generated in CompilerProperties.

The whole structure implements the Diagnostic interface, which is part of Java's public API.26

Context

One of the more interesting concepts in javac is its Context mechanism.

In "regular" code that wants a singleton, you generally hide the constructor for your class and expose a single instance in some way.

final class Apple {
    private Apple() {}
    
    public static final Apple INSTANCE = new Apple();
}

The other option is to make your class "normal" but rely on some dependency injection framework to automatically create, manage, and provide singular instances of that class.

final class Apple {
    // ...
}

class UsageSite {
    private Apple apple;
    
    @Inject
    UsageSite(Apple apple) {
        this.apple = apple;
    } 
}

javac wants only single instances of many of its classes, but it also wants to allow for multiple instances of the compiler to run in parallel on the same JVM.

The solution they use is to have one class, Context, which holds a map of Context.Key to Objects.

Classes like JCDiagnostic.Factory have factory methods that do a get-or-create with their own constant Context.Keys.

public static class Factory {
    /** The context key for the diagnostic factory. */
    protected static final Context.Key<JCDiagnostic.Factory> diagnosticFactoryKey 
            = new Context.Key<>();

    /** Get the Factory instance for this context. */
    public static Factory instance(Context context) {
        Factory instance = context.get(diagnosticFactoryKey);
        if (instance == null)
            instance = new Factory(context);
        return instance;
    }
    // ...
}

And then this Context is threaded to every class in the compiler that wants to get instances of those "contextual singletons" or themselves participate in the mechanism.

protected Flow(Context context) {
    context.put(flowKey, this);
    names = Names.instance(context);
    log = Log.instance(context);
    syms = Symtab.instance(context);
    types = Types.instance(context);
    chk = Check.instance(context);
    lint = Lint.instance(context);
    rs = Resolve.instance(context);
    diags = JCDiagnostic.Factory.instance(context);
    Source source = Source.instance(context);
}

Log

During compilation, if a problem is encountered anywhere, the compiler constructs and "emits" a diagnostic.

It does this by getting an instance of the Log contextual singleton and using the generated constants from CompilerProperties.

public class Operators {
    // ...
    
    protected Operators(Context context) {
        context.put(operatorsKey, this);
        syms = Symtab.instance(context);
        names = Names.instance(context);
        log = Log.instance(context);
        types = Types.instance(context);
        noOpSymbol = new OperatorSymbol(
                names.empty, Type.noType, -1, syms.noSymbol
        );
        initOperatorNames();
        initUnaryOperators();
        initBinaryOperators();
    }
    
    // ...


    private OperatorSymbol reportErrorIfNeeded(
            DiagnosticPosition pos, 
            Tag tag, 
            Type... args
    ) {
        if (Stream.of(args).noneMatch(t -> 
                t.isErroneous() || t.hasTag(TypeTag.NONE)
        )) {
            Name opName = operatorName(tag);
            JCDiagnostic.Error opError = (args.length) == 1 ?
                    Errors.OperatorCantBeApplied(
                            opName, args[0]
                    ) :
                    Errors.OperatorCantBeApplied1(
                            opName, args[0], args[1]
                    );
            log.error(pos, opError);
        }
        return noOpSymbol;
    }
    
    // ...
}

Log internally holds onto the JCDiagnostic.Factory contextual singleton in order to construct JCDiagnostics from DiagnosticInfos like JCDiagnostic.Error.27

public class Log extends AbstractLog {
    // ...
    
    private Log(
            Context context, 
            Map<WriterKind, PrintWriter> writers
    ) {
        super(JCDiagnostic.Factory.instance(context));
        context.put(logKey, this);
        this.writers = writers;

        @SuppressWarnings("unchecked") // FIXME
        DiagnosticListener<? super JavaFileObject> dl =
                context.get(DiagnosticListener.class);
        this.diagListener = dl;

        diagnosticHandler = new DefaultDiagnosticHandler();

        messages = JavacMessages.instance(context);
        messages.add(Main.javacBundleName);

        final Options options = Options.instance(context);
        initOptions(options);
        options.addListener(() -> initOptions(options));
    }
    
    // ...
}

All the diagnostics are then reported to a DiagnosticHandler.

public class Log extends AbstractLog {
    // ...

    @Override
    public void report(JCDiagnostic diagnostic) {
        diagnosticHandler.report(diagnostic);
    }

    // ...
}

DiagnosticFormatter

There are some steps I am skipping, but eventually diagnostics flow from DiagnosticHandlers to a DiagnosticFormatter which is responsible for formatting the diagnostic for display.

There are a few implementations of DiagnosticFormatter, but the most relevant is BasicDiagnosticFormatter.

BasicDiagnosticFormatter has three formats that it recognizes. Diagnostics with a position, diagnostics without a position, and diagnostics originating in a class file for which the source is not available. It uses custom format strings that describe how it should display each of those diagnostics.

private void initFormat() {
    initFormats("%f:%l:%_%p%L%m", "%p%L%m", "%f:%_%p%L%m");
}

For backwards compatibility reasons, javac also maintains an "old style" diagnostics format and a "normal" format. The old style diagnostics format can be enabled with compiler flags.

public static class BasicConfiguration 
        extends SimpleConfiguration {
    public BasicConfiguration(Options options) {
        // ...
        
        initIndentation();
        if (options.isSet("diags.legacy"))
            initOldFormat();
        String fmt = options.get("diags.layout");
        if (fmt != null) {
            if (fmt.equals("OLD"))
                initOldFormat();
            else
                initFormats(fmt);
        }
        
        // ...
    }
}

The format strings consist of "meta characters" that represent different components of the diagnostic.
The meta characters and other components are formatted independently and then concatenated.

For the normal format diagnostics with a position, each section of the string has a meaning as follows:

Component Meaning
%f Source file name
: A literal ":" character (U+003A)
%l Line number for the diagnostic
: A literal ":" character (U+003A)
%_ A space character (U+0020)
%p The prefix for the diagnostic type: one of "Note: ", "warning: ", or "error: "
%L The lint category for this diagnostic, if it is a lint
%m The localized message for the diagnostic

After these components, the source code at the position of the diagnostic is inserted.

The source code is inserted at the end of the diagnostic if the diagnostic message is a single line, or the source code is inserted after the first line of the diagnostic message if it is multiline.28

Structural Problems

If we want errors closer to what Rust has, the most important structural deficiency to tackle is javac's message oriented-ness.

By that I am referring to the fact that every kind of diagnostic is "just a message." There is no clearly delineated place to put hints or other context.29

This is something that has already had to be worked around. Take the following program.

public class Main {
    public static void main(String[] args) {
        Object o = 123;
        switch (o) {
            case Integer i -> System.out.println(i);
            default -> {}
        };
    }
}

This program uses pattern switches, which are a preview feature. This is the error you would get if you tried to compile it.

Main.java:5: error: patterns in switch statements are a preview feature and are disabled by default.
          case Integer i -> System.out.println(i);
               ^
  (use --enable-preview to enable patterns in switch statements)
1 error

While you might look at this and think that there is already a dedicated place for hints, you would be wrong.

This message come from this entry in the compiler.properties file.

# 0: message segment (feature)
compiler.err.preview.feature.disabled.plural=\
   {0} are a preview feature and are disabled by default.\n\
   (use --enable-preview to enable {0})

You will notice that the only thing separating the hint to use the --enable-preview flag from the initial message is a newline.

BasicDiagnosticFormatter just has a heuristic where it assumes that any newline in a message means that the lines following it should be displayed below the code that is the source of the issue.

public class BasicDiagnosticFormatter 
        extends AbstractDiagnosticFormatter {
    // ...
    public String formatMessage(JCDiagnostic d, Locale l) {
        // ...

        if (lines.length > 1
                && getConfiguration()
                .getVisible()
                .contains(DiagnosticPart.DETAILS)) {
            currentIndentation += getConfiguration()
                    .getIndentation(DiagnosticPart.DETAILS);
            for (int i = 1; i < lines.length; i++) {
                buf.append(
                        "\n" + indent(lines[i], currentIndentation)
                );
            }
        }
        
        // ...
        return buf.toString();
    }
    
    // ...
}

This causes a problem for diagnostics that have a newline that is not accounted for in that heuristic like the following.30

# TODO 308: make a better error message
compiler.err.this.as.identifier=\
    as of release 8, ''this'' is allowed as the parameter name for the receiver type only\n\
    which has to be the first parameter, and cannot be a lambda parameter

If you construct a program that triggers this error like so:

public class Math {
    static int add(int a, int this) {
        return a + this;
    }
}

You will get the following.

Math.java:2: error: as of release 8, 'this' is allowed as the parameter name for the receiver type only
    static int add(int a, int this) {
                              ^
  which has to be the first parameter, and cannot be a lambda parameter
1 error

Which feels at the very least unintentional.

There are other places where the pressure to say more than one thing in a message leads to larger irreconcilable inconsistencies.31

Every year or so there is someone who complains about a specific error on the mailing list. This usually leads to a concrete improvement in whatever error they complain about, but the root problem here is structural.32

Structural Solutions

Structural problems always require structural solutions, so that's what we aimed to do.

The way Rust deals with this is with structured diagnostics.33 We translated that approach into the existing JCDiagnostic world by introducing two new concepts: Help and Info.

In our prototype all JCDiagnostics now carry an optional Help and an optional Info.

public class JCDiagnostic 
        implements Diagnostic<JavaFileObject> {
    // ...

    private final DiagnosticSource source;
    private final DiagnosticPosition position;
    private final DiagnosticInfo diagnosticInfo;
    private final Set<DiagnosticFlag> flags;
    private final LintCategory lintCategory;
  
    private final Info info;
    private final Help help;
    
    // ...
}

Help

"Help"s carry two pieces of information.

public record Help(
        HelpFragment message,
        List<SuggestedChange> suggestedChanges
) {
    // ...
}

First is a message. This is a fragment of text just like other DiagnosticInfos and it is where the actual text of "use --enable-preview to enable patterns in switch statements" would come from.

public static final class HelpFragment extends DiagnosticInfo {
    public HelpFragment(String prefix, String code, Object... args) {
        super(HELP, prefix, code, args);
    }
}

The second is a list of zero or more suggested changes.

public record SuggestedChange(
        DiagnosticSource source,
        RangeDiagnosticPosition position,
        String replacement,
        Applicability applicability
) {
    // ...
}

These SuggestedChanges all know where in the source they are referring to, what replacements to make in order to apply the suggestion, and to what degree the suggestion is mechanically applicable.

public enum Applicability {
    MACHINE_APPLICABLE,
    HAS_PLACEHOLDERS,
    UNKNOWN
}

Help messages would include:

  • Suggesting importing a class or enum
  • Suggesting fixing a misspelled identifier to a similarly named identifier that actually exists
  • Explaining that an identifier doesn't exist, but it does exist in a class that is accessible
  • Suggesting changing arguments to a function to satisfy the signature

Info

Infos are similar in spirit to Helps, but they only provide helpful context. They do not suggest a user change their code in any concrete way.

public record Info(
        InfoFragment message,
        List<InfoPosition> positions
) {
}

An Info holds a text fragment and a list of all the places in the code that are relevant to that message.

Info messages would include:

  • Displaying related function signatures to show the programmer the expected signature
  • Displaying the declaration of a class, field, or local variable to show what type was expected
  • Providing more detailed information as to why something is not allowed, such as assert becoming a keyword in Java 1.4
  • Displaying supplementary information related to valid values

compiler.properties

In order to get the text for help and info messages in a localization friendly way, we piggybacked on the existing conventions in the compiler.properties files.

We updated the tooling so that properties keyed by compiler.help.* and compiler.info.* are translated into fragments inside CompilerProperties the same as was done for compiler.error.* and company.

compiler.info.function.declared.here=\
    function declared here

# ...

# 0: kind name, 1: symbol
compiler.help.similar.symbol=\
    a similarly named {0} exists: {1}
public class CompilerProperties {
  public static class Infos {
    // ...

    /**
     * compiler.info.function.declared.here=\
     *    function declared here
     */
    public static final InfoFragment FunctionDeclaredHere
            = new InfoFragment("compiler", "function.declared.here");

    // ...
  }
  public static class Helps {
    // ...

    /**
     * compiler.help.similar.symbol=\
     *    a similarly named {0} exists: {1}
     */
    public static HelpFragment SimilarSymbol(
            KindName arg0, Symbol arg1
    ) {
      return new HelpFragment(
              "compiler", "similar.symbol", arg0, arg1
      );
    }

    // ...
  }
}

At the sites where diagnostics are emitted, there Helps and Infos can then be attached to a JCDiagnostic by referencing these generated classes and using the withHelp or withInfo methods.

class SymbolNotFoundError extends ResolveError {
    // ...
    @Override
    JCDiagnostic getDiagnostic(
            JCDiagnostic.DiagnosticType dkind,
            DiagnosticPosition pos,
            Env<AttrContext> env,
            Type site,
            Name name,
            List<Type> argtypes,
            List<Type> typeargtypes
    ) {
      // ...
      if (hasLocation) {
        // ...
        if (suggestMember != null) {
            diag = diag.withHelp(
                    new Help(
                            Helps.SimilarSymbol(
                                    Kinds.kindName(suggestMember),
                                    suggestMember
                            )
                    )
            );
        }

        return diag;
      }
            
      // ...
    }
}

The logic in the code above produces errors like the following.

Test.java:5: error: cannot find symbol
        TetsingMetho();
        ^
  symbol:   method TetsingMetho()
  location: class Test
help: a similarly named method exists: TestingMethod()

What is important here isn't the analysis being performed or how it is displayed exactly. Those are all things that can be disputed by reasonable people. It's that there is now a place to give that sort of advice.

With just that little bit of structure, it suddenly becomes tractable to build a feature like "suggest methods with similar names."

DiagnosticFormatter

In order to retrofit BasicDiagnosticFormatter to display help and info messages, we needed to add some more meta characters to its format strings.

public class BasicDiagnosticFormatter 
        extends AbstractDiagnosticFormatter {
    // ...
    public static class BasicConfiguration 
            extends SimpleConfiguration {
        // ...
        private void initFormatWithInfoAndHelp() {
            initFormats(
                    "%f:%l:%_%p%L%m%i%h",
                    "%p%L%m%i%h",
                    "%f:%_%p%L%m%i%h"
            );
        }
        // ...
    }
    // ...
}
Component Meaning
%h Help Message
%i Info Message

This has the very convenient property that, at least for this design, we can put the new reporting format behind a flag.

public class BasicDiagnosticFormatter 
        extends AbstractDiagnosticFormatter {
    // ...
    public static class BasicConfiguration 
            extends SimpleConfiguration {
        // ...

        public BasicConfiguration(Options options) {
            // ...
          
            if (options.isSet(/* ... */)) {
                initFormatWithInfoAndHelp();
            } else {
                initFormat();
            }
            
            // ...
        }
        
        private void initFormat() {
            initFormats(
                    "%f:%l:%_%p%L%m",
                    "%p%L%m",
                    "%f:%_%p%L%m"
            );
        }
        
        private void initFormatWithInfoAndHelp() {
            initFormats(
                    "%f:%l:%_%p%L%m%i%h",
                    "%p%L%m%i%h",
                    "%f:%_%p%L%m%i%h"
            );
        }
        // ...
    }
    // ...
}

Remaining Work

There are many things left unfinished.

  • Some of our modifications show suboptimal positions in particularly complex cases. Fixing this involves passing more context down through the compiler and keeping more positional information.
  • The wording of our additional info can likely be improved to flow better with the writing style of the existing errors.
  • The messages that make use of the newline heuristic, especially the --enable-preview ones, can be moved to having an attached Help. Doing so would mean that either the keys for those messages would need to be duplicated or the original messages would be degraded if the flag for this were turned off.
  • Almost every error and warning could use tone audit.
  • In general, the compiler tends to drop information between phases. We didn't find a system other than to go case by case. and just thread through the information we need as we need it.
  • compiler.err.cant.apply.symbol is several different errors in a trench-coat.
  • Ich spreche kein Deutsch.
  • 私は日本語を話しません
  • 我不会说中文
  • And much, much more

But I am satisfied with the progress we made. You can find a bestiary of the specific errors that were tackled here.

Call to Action

Submitting a JEP, while technically an open process, in practice seems to be helped by having more free time than is reasonable34, being paid to work on OpenJDK, or social capital.

If you want this work to continue, you should voice your support on compiler-dev@openjdk.org or in whatever forum you think would have the most impact. If you do want to use the mailing lists, take note that you need to sign up for the mailing list yourself in order for your emails to go through.

If you are interested in continuing this work yourself, the current state of this is on the hints branch here. There is quite a bit left to do and a lot of arcane knowledge we picked up along the way, so please reach out if you choose this path. I can at least help you get set up in IntelliJ to build the JDK.35

If you are in a position of authority at of one of the large companies that has dedicated staff working on OpenJDK, hire some or all of us. Failing that, dedicate other person-power to the issue.3637

JEP

JEP Draft

Summary

Enhance the Java compiler with errors that are easier to read and understand.

Goals

The primary goal is to make the reference Java compiler competitive with the compilers for other languages in terms of the helpfulness of its error messages.

This is proposed to be accomplished by

  • Enhancing the compiler so that it can reliably display hints to the user for how to resolve issues.
  • Enhancing the compiler so that it can provide information to the user that indicates why an error occurred.
  • Auditing the tone of th existing set of messages.

Non-Goals

It is not a priority to alter the set of warnings and lints that the compiler reports on, though that could feasibly come as a future JEP.

It is also not a priority to give this same treatment to other tools, such as jar, javadoc, or jlink, though that could feasibly come as a future JEP.

It is not a goal to provide an equivalent to the --explain flag present in other compilers or to assign error conditions unique numeric codes, though that could feasibly come as a future JEP.

It is not a goal to provide a structured output for consumption by IDEs or tools, though that could feasibly come as a future JEP.

It is not an explicit goal to enhance the API provided by java.compiler to allow annotation processors and other user code to introspect on any new functionality, though that might fall out of the design process.

It is a non-goal to completely refactor the entire javac diagnostic process or to modify every single error message. Some are fine as they are.

It is not a goal to provide any specific kind of analysis, though it is assumed that some new analyses should be performed.

Description

(The following represents a preliminary design and is subject to change)

We propose modifying the structure of the javac's diagnostic system by adding Help and Info structures to JCDiagnostic.

Helps would provide information to users with suggestions on how to change their code. The localized text for Helps would be provided by properties keyed under compiler.help.*. Information in a help message should be actionable information or ideally a functional code suggestion.

Each code suggestion in a help contains a range in the source code that it applies to, a string to replace the source code with, and an enum representing whether the suggestion can be applied automatically, needs some manual work, or can’t be automatically applied.

Infos would provide useful context to users on why they are receiving a given warning or error. The localized text for Infos would be provided by properties keyed under compiler.info.*.

The code to format diagnostics for display will also be updated to support embedding these two pieces of information and existing messages will be updated to make appropriate use of the new structure.

Alternatives

  • Do nothing. There is significant opportunity cost in any restructuring, and it might be the case that a restructuring of error messages do not provide enough practical benefit to be worth the effort.

  • Wait. It's certainly possible that the ideas currently available represent a local maxima of compiler design and that going toward them would be a misstep.

  • Defer to the community. It might be possible to accomplish most of these goals with an alternative compiler implementation. javac's primary purpose is to be a reference compiler for the JLS. This would limit its exposure though, especially to beginners.

  • Leave it to IDEs. Visual feedback is, in some ways, preferable to textual feedback. Not all categories of users, particularly the users for which feedback matters the most (students, beginners, etc.), are able or willing to use IDEs.

Testing

The compiler should still give errors in the same situations before changes as after and still emit identical bytecode. The existing set of jtreg tests should be enough to validate that, though they will need to be updated to test for items related the new structure.

Testing for whether errors are actually useful is a social problem.

Risks and Assumptions

Localization will likely pose a challenge. The current corpus of error messages is significant and would need to be updated.

There are also undoubtedly tools in the ecosystem that function off of parsing the exact structure emitted by javac. Ultimately either they will need to be broken, the new functionality would have to be hidden behind a compiler flag, or the old functionality will have to be explicitly enabled.

It is possible that keeping track of the information needed to provide a good hint to the user could increase the memory footprint of the compiler during successful runs.

The speed of the compiler could be also affected. Ideally this could be minimized, but it is hard to know in what way until an MVP is in place.

Dependencies

What hints should be given in which scenarios should conceptually be driven by data on what error conditions are actually hit. This can be further stratified by which errors are hit by different groups such as "total beginners", "working developers", etc. If there is not already a corpus of this sort of data then it would be prudent to try to organize a way to gather it.

There are active projects like Valhalla and Amber that will likely result in significant updates to the compiler. It might be necessary to wait for those changes to "blow over" before there is enough stability to do make structural changes. These projects also alter the semantics of the language, so they could feasibly affect what an ideal error messages would be.

1: And well over a year of mental illness from me.

2: This definition isn't exactly accurate. JITs are compilers too but they do their work in memory. There is no artifact to speak of, but I think its close enough to what most people think of when they think of a compiler. The wikipedia entry for compilers has a more accurate definition.

3: Languages like C, C++, Rust, Zig, etc. also care quite a lot about optimizations. It might be a little disingenuous of me to gloss over that, but this post is about a language backed by a VM. The real optimizations happen at runtime and dwelling too long on that felt a bit much.

4: The Elm language is closest in spirit to Haskell. All code needs to be purely functional, the type system is strong, and the syntax is similar as well. What sets it apart is that it doesn't have Haskell classic features like type-classes, do-notation, or `IO` monads. Good case study in addition by subtraction.

5: I'm attributing causation here, but I don't actually have strong proof that the better error messages work benefited from Elm being small and focused. At least in my brain it tracks though. Evan Czaplicki is but one man. Doing what he did but with C++ would have been infeasible.

6: If twitter dies, just know that these links were to people saying nice things about Elm.

7: This isn't actually that good of an argument in and of itself. I know that. I just find it rhetorically compelling. I love Elden Ring, but I don't want to be playing it at every day at my job and I wouldn't want to force someone to deal with all the flavors of "lol, get rekt" it entails.

8: "Basic analysis" is doing a lot of legwork in that sentence. Its basic in large part because of the simplicity of the Elm language. Libraries, modules, and source files work in precisely one way. Doing the same kind of analysis with classes on the class-path, modules on the module-path, and code yet to be compiled might be quite a bit harder for Java.

9: I'm doing it for the example directly below, but it's hard to get the pandoc renderer which makes this site to like inline spans with color information. I'll only do that work when I'm talking about coloring and, because I'm writing these footnotes as I go, I won't rule out just falling back to images.

10: This is nowhere near a complete explanation of the borrow checker, how it functions, or what it accomplishes. For that you should read the Rust book or any of the many good references online. I'm trusting that most readers will have a passing familiarity.

11: This explanation conflates a little bit borrow checking and lifetime checking. There are also no "borrows" that happen in the following code, but I've seen "the borrow checker" used interchangably with what could be called "the lifetime checker" and at the level of zoom this post is at I think the difference isn't that crucial. Doesn't help that I'm not exactly sure where lines for these terms are regardless.

12: I think there is decent evidence for this, but it is really hard to prove.

13: The initial design for which seems to have been directly inspired by Elm.

14: During the aforementioned "fighting the borrow checker" phase.

15: You could probably make this same argument for any part of the Rust ecosystem like the polish in Cargo, Clippy, etc.

16: Compare how "cannot find symbol" sounds versus "cannot find value `A` in this scope". The difference is subtle, but it matters.

17: IntelliJ does this with a "light bulb" that appears when you hover over a bit of malformed code that contains a contextual menu with potential fixes.

18: I do not remember exactly what they are called, but my old High School had a bunch of machines that were basically just empty boxes that connected to a shared Windows server.

19: I hesitate to mention this, but the CS gender ratio is absolutely wack when famously it did not start out that way. There are lots of shitty reasons for this, and I'm not saying that its Java's fault, but it is worthy of note that Java has been an extremely popular first language in education ever since its release and as a consequence has presided over this depressing chart. I can't prove it20 but I believe that small frictions like `public static void main` and obtuse error messages filter students down to the demographics that are willing to deal with that sort of hostility.

20: I haven't been able to find any meaningful research to back up or refute this claim. That must be because there is none, I'm bad at searching, or I've unconsciously ignored evidence that refutes my point.

21: "I'm right and science hasn't caught up to how right I am."

22: I will later be shown that I was at least somewhat wrong about this.

23: No relation.

24: These are the original PowerPoint and Design Document they produced for their class.

25: Other compilers like ECJ exist, but being the default means always mean javac is going to be what most people use. That's why the focus was there, it is where changes can do the most good.

26: Something that is merciful about wanting to change the internals of javac is that things like Diagnostic are all I need to worry about. If we're not changing the set of programs accepted by the Java compiler and aren't looking to change supported APIs, then we should be able to avoid sanction by Java's backwards compatibility policies.

27: The FIXME in this code is from 2011.

28: This explanation was taken almost verbatim from the students' paper.

29: Giving the information a dedicated and delineated position.

30: This FIXME is from 2013.

31: I'll take "irreconcilable inconsistencies" for $500.

32: Structural issues like this I don't think are because of incompetence. The goal of `javac` has always been to be a reference compiler for Java. Everyone wants "good" errors, but the status quo is a result of the natural trend when a diagnostic ~= a line of output.

33: Structure, structure, structure.

34: If you get enough commits into OpenJDK and are a committer you can submit a JEP. That requires fixing a lot of JDK issues and going through a nomination process. I have Hogans Heroes to binge.

35: That took awhile for us to figure out even with the tutorials that exist.

36: Even if you aren't convinced that the approach we've taken is the one to continue with, hopefully you recognize that there is a real problem here.

37: I, at least, am not particularly special. I do however have a lot of context on this area of the code by now.


<- Index

Make your own Optionals

by: Ethan McCue

This is java.util.Optional.

I took out all the comments and did a little reformatting, but this is the entire class. Just around 150 lines managing one nullable field.

Take a minute to read or skim it before moving on.

public final class Optional<T> {
    private static final Optional<?> EMPTY =
            new Optional<>(null);
    
    private final T value;
    
    public static<T> Optional<T> empty() {
        @SuppressWarnings("unchecked")
        Optional<T> t = (Optional<T>) EMPTY;
        return t;
    }
    
    private Optional(T value) {
        this.value = value;
    }
    
    public static <T> Optional<T> of(T value) {
        return new Optional<>(Objects.requireNonNull(value));
    }

    @SuppressWarnings("unchecked")
    public static <T> Optional<T> ofNullable(T value) {
        return value == null ? (Optional<T>) EMPTY
                             : new Optional<>(value);
    }
    
    public T get() {
        if (value == null) {
            throw new NoSuchElementException("No value present");
        }
        return value;
    }
    
    public boolean isPresent() {
        return value != null;
    }
    
    public boolean isEmpty() {
        return value == null;
    }
    
    public void ifPresent(Consumer<? super T> action) {
        if (value != null) {
            action.accept(value);
        }
    }
    
    public void ifPresentOrElse(
            Consumer<? super T> action, 
            Runnable emptyAction
    ) {
        if (value != null) {
            action.accept(value);
        } else {
            emptyAction.run();
        }
    }
    
    public Optional<T> filter(Predicate<? super T> predicate) {
        Objects.requireNonNull(predicate);
        if (!isPresent()) {
            return this;
        } else {
            return predicate.test(value) ? this : empty();
        }
    }
    
    public <U> Optional<U> map(
        Function<? super T, ? extends U> mapper
    ) {
        Objects.requireNonNull(mapper);
        if (!isPresent()) {
            return empty();
        } else {
            return Optional.ofNullable(mapper.apply(value));
        }
    }
    
    public <U> Optional<U> flatMap(
         Function<? super T, ? extends Optional<? extends U>> mapper
    ) {
        Objects.requireNonNull(mapper);
        if (!isPresent()) {
            return empty();
        } else {
            @SuppressWarnings("unchecked")
            Optional<U> r = (Optional<U>) mapper.apply(value);
            return Objects.requireNonNull(r);
        }
    }
    
    public Optional<T> or(
            Supplier<? extends Optional<? extends T>> supplier
    ) {
        Objects.requireNonNull(supplier);
        if (isPresent()) {
            return this;
        } else {
            @SuppressWarnings("unchecked")
            Optional<T> r = (Optional<T>) supplier.get();
            return Objects.requireNonNull(r);
        }
    }
    
    public Stream<T> stream() {
        if (!isPresent()) {
            return Stream.empty();
        } else {
            return Stream.of(value);
        }
    }
    
    public T orElse(T other) {
        return value != null ? value : other;
    }
    
    public T orElseGet(Supplier<? extends T> supplier) {
        return value != null ? value : supplier.get();
    }
    
    public T orElseThrow() {
        if (value == null) {
            throw new NoSuchElementException("No value present");
        }
        return value;
    }
    
    public <X extends Throwable> T orElseThrow(
            Supplier<? extends X> exceptionSupplier
    ) throws X {
        if (value != null) {
            return value;
        } else {
            throw exceptionSupplier.get();
        }
    }
    
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }

        return obj instanceof Optional<?> other
                && Objects.equals(value, other.value);
    }
    
    @Override
    public int hashCode() {
        return Objects.hashCode(value);
    }
    
    @Override
    public String toString() {
        return value != null
            ? ("Optional[" + value + "]")
            : "Optional.empty";
    }
}

Why does Optional exist

java.util.Optional was introduced in Java 8 alongside the Stream API. Its raison d'etre is to make coders explicitly consider what to do when using methods like findFirst on a potentially empty Stream.

// Explicitly throws 
int valueOne = list
    .stream()
    .map(x -> x + 1)
    .filter(x -> x % 2 == 0)
    .findFirst()
    .orElseThrow()
    
// Explicitly uses a default value
int valueTwo = list
    .stream()
    .map(x -> x + 1)
    .filter(x -> x % 2 == 0)
    .findFirst()
    .orElse(0)

The deficiency it targets is in the interaction between null and "method chaining style". When there are so many method calls stacked up, it is hard for people to remember to handle cases like null return values.

So with streams poised to encourage method chaining, Optional was needed to make that API not lead to hidden bugs.

What's wrong with Optional?

Nothing really.

The core tension that leads to so much discourse is that there is no way to represent null in Java's type system. Whether from lived experience or religious fervor, folks tend to be afraid of an unaccounted for null.

Because Optional is in the standard library and explicitly represents "absence or presence", it is extremely tempting to just replace every nullable thing with an Optional<T>.

Doing this can lead to code that sucks, especially if you try to avoid null for local variables.

// Some might try to use isPresent()/get() to avoid null
Optional<String> nameOpt = f();
Optional<Integer> ageOpt = g();

if (nameOpt.isPresent() && ageOpt.isPresent()) {
    var name = nameOpt.get();
    var age = nameOpt.get();

    System.out.println(
        name + " is " + age + " years old"
    );
}
// Others might try to map/flatMap.
Optional<String> nameOpt = f();
Optional<Integer> ageOpt = g();

nameOpt
    .flatMap(name -> 
        ageOpt.map(age -> {
            System.out.println(
                name + " is " + age + " years old"
            );
        }))
// But its questionable what's gained over null.
String name = f().orElse(null);
Integer age = g().orElse(null);

if (name != null && age != null) {
    System.out.println(
        name + " is " + age + " years old"
    );
}

But this is honestly fine.

Yes, the Optional will use up more memory and perform a bit worse than the equivalent code with null. Yes, code written with isPresent/get/orElseThrow or map/flatMap can be a bit crusty. Yes, it wasn't intended to be a field or a method parameter. There are a lot of bike sheds to build and "best practices" to get into internet fights over.

But jspecify is poised to give standard nullability annotations and tooling to augment the type system with them. Project Valhalla is considering giving a way to express null restricted storage. In the fullness of time, the core tension that leads to this "overuse" seems like it will be resolved.

The problem with both Optional and null is that they only convey that some data might be absent and not what being absent implies.

The Meaning of Absence

Say you were writing a program which had to record peoples' first and last names for legal reasons. Users can still sign up, but they will need to give that information before continuing on to other parts of the app.

Today you might see Optional being used to represent that.

import java.util.Optional;

record Person(
        int id,
        Optional<String> firstName,
        Optional<String> lastName
) {}

In the near future, maybe a @Nullable annotation.

import org.jspecify.annotations.Nullable;

record Person(
        int id,
        @Nullable String firstName,
        @Nullable String lastName
) {}

In both cases - null and an empty Optional - an absent value implies that you have not been given that information yet.

You can use this to know when to stop a user and ask them for their name.

import java.util.Optional;

record Person(
        int id,
        Optional<String> firstName,
        Optional<String> lastName
) {
    boolean shouldAskForInfo() {
        return firstName.isEmpty() || lastName.isEmpty();
    }
}
import org.jspecify.annotations.Nullable;

record Person(
        int id,
        @Nullable String firstName,
        @Nullable String lastName
) {
    boolean shouldAskForInfo() {
        return firstName == null || lastName == null;
    }
}

Now, consider Madonna. Madonna does not have a last name. If a null or empty value in the lastName field means "not provided", you have no way to directly represent "known to not exist."

// Need to ask Bob for his last name still
var bob = new Person(1, "Bob", null);

// Shouldn't ask Madonna for anything
var madonna = new Person(2, "Madonna", null);

Using an empty string is tempting, but if you do that you will have the same problem that null currently has. By having a "special" value not expressed in the type system, you are liable to forget to check for that special value.

// Empty string can be a sentinel
var madonna = new Person(2, "Madonna", "");

// But if you forget that it is special
// you might give Madonna a subpar user experience
var welcome = "Hello " 
        + person.firstName()
        + " "
        + person.lastName()
        + "!";

// "Hello Madonna !"
// She'll notice. She'll hate you.

The reality of our fictional data model is that we have three distinct cases.

  1. We have not been given a last name.
  2. We have been told there is no last name.
  3. We have been given a last name.

The most convenient tool we have for representing this sort of situation is a sealed interface.

sealed interface LastName {
    record NotGiven() implements LastName {}
    record DoesNotExist() implements LastName {}
    record Given(String value) implements LastName {}
}

Now when a LastName has an absent value, we can know whether that is because it doesn't exist or we just haven't been told.

import org.jspecify.annotations.Nullable;

record Person(
        int id,
        @Nullable String firstName,
        LastName lastName
) {
    boolean shouldAskForInfo() {
        return firstName == null 
                || lastName instanceof LastName.NotGiven;
    }
}

And we can properly represent Madonna.

// Need to ask Bob for his last name still
var bob = new Person(1, "Bob", new LastName.NotGiven());

// Shouldn't ask Madonna for anything
var madonna = new Person(2, "Madonna", new LastName.DoesNotExist());

// Joe is all set
var joe = new Person(3, "Joe", new LastName.Given("Shmoe"));

Optional and null let you represent exactly 2 possibilities, a sealed hierarchy lets you represent 2 or more possibilities. The reason I'm using the Madonna example is that it is a straw-man where you want to represent 3 distinct possibilities.

My bold claim is that even when there are only 2 possibilities you should still consider making your own class instead of using Optional or null.

Both @Nullable String firstName and Optional<String> firstName do not directly convey what it means if the data is missing. Its just "absent." The fact that it means you haven't been told is context external to your domain model.

It's a similar problem to primitive obsession. Because null and Optional are there and fit the "shape" we want we gravitate to them.

What if instead of that we were to make our own "optional" class.

sealed interface FirstName {
    record NotGiven() implements FirstName {}
    record Given(String value) implements FirstName {}
}

So here FirstName is identical in spirit to an Optional<String>, but with the benefit of us being able to give a name to the situation where there is no value. It's not empty or present, we were either given a first name or we weren't.

With pattern matching you will be able switch over these two situations.

switch (person.firstName()) {
   case FirstName.NotGiven _ ->     
        System.out.println("No first name");
   case FirstName.Given(String name) -> 
        System.out.println("First name is " + name);
}

And part of the reason I put all the code for Optional at the top was to impress upon you how trivial it would be to add any of those helper methods to a class you made yourself.

sealed interface FirstName {
    record NotGiven() implements FirstName {
        @Override
        public String orElse(String defaultValue) {
            return defaultValue;
        }
    }
    record Given(String value) implements FirstName {
        @Override
        public String orElse(String defaultValue) {
            return this.value;
        }
    }
    
    String orElse(String defaultValue);
}
var name = person.name().orElse("?");

That's it. That's the thesis.

If you are spending time modeling your domain objects, consider making your own versions of an Optional class. You can choose names which more align with your domain, adapt to more varied situations, and the boilerplate for doing so is at a historic low.

I will admit that if you have a huge number of fields with potentially missing data this can be more trouble than its worth. I still think its worth considering.


<- Index

Please try my JSON library

by: Ethan McCue

bowbahdoe/json - GitHub

For the past four months I've been working on a JSON library for Java.

It's not original. Most of the implementation of the parser I stole from Clojure's data.json and the user facing API is a total ripoff of Elm's JSON library. The only novel engineering I've done has been in translation.

It's also not amazingly fast. Last I benchmarked it, it was around 5x as slow as Jackson, the current king of Java's JSON castle. There are paths to improve that but, whether for lack of time or ability, I haven't explored any of them.

Despite all that, I think you should try it. The rest of this post is going to be an effort to convince you to do so.

First, I am going to go through a basic tutorial to get you up to speed. Then I am going to go through some pitches that I hope convince you.

Tutorial

The Data Model

JSON is a data format. It looks like the following sample.

{
    "name": "kermit",
    "wife": null,
    "girlfriend": "Ms. Piggy",
    "age": 22,
    "children": [
        {
            "species": "frog",
            "gender": "male"
        },
        {
            "species": "pig",
            "gender": "female"
        }
    ],
    "commitmentIssues": true
}

In JSON you represent data using a combination of objects (maps from strings to JSON), arrays (ordered sequences of JSON), strings, numbers, true, false, and null.

Therefore, one "natural" way to think about the data stored in a JSON document is as the union of those possibilities.

JSON is one of
- a map of string to JSON
- a list of JSON
- a string
- a number
- true
- false
- null

The way to represent this in Java is using a sealed interface, which provides an explicit list of types which are allowed to implement it.

public sealed interface Json
        permits 
            JsonObject,
            JsonArray,
            JsonString,
            JsonNumber,
            JsonBoolean,
            JsonNull {
}

This means that if you have a field or variable which has the type Json, you know that it is either a JsonObject, JsonArray, JsonString, JsonNumber, JsonBoolean, or JsonNull.

That is the first thing provided by my library. There is a Json type and subtypes representing those different cases.

import dev.mccue.json.*;

public class Main {
    static Json greeting() {
        return JsonString.of("hello");
    }
    
    public static void main(String[] args) {
        Json json = greeting();
        switch (json) {
            case JsonObject object ->
                    System.out.println("An object");
            case JsonArray array ->
                    System.out.println("An array");
            case JsonString str ->
                    System.out.println("A string");
            case JsonNumber number ->
                    System.out.println("A number");
            case JsonBoolean bool ->
                    System.out.println("A boolean");
            case JsonNull __ ->
                    System.out.println("A json null");
        }
    }
}

You can create instances of these subtypes using factory methods on the types themselves.

import dev.mccue.json.*;

import java.util.List;
import java.util.Map;

public class Main {
    public static void main(String[] args) {
        JsonObject kermit = JsonObject.of(Map.of(
                "name", JsonString.of("kermit"),
                "age", JsonNumber.of(22),
                "commitmentIssues", JsonBoolean.of(true),
                "wife", JsonNull.instance(),
                "children", JsonArray.of(List.of(
                        JsonString.of("Tiny Tim")
                ))
        ));

        System.out.println(kermit);
    }
}

Or by using factory methods on Json, which aren't guaranteed to give you any specific subtype but in exchange will handle converting any stray nulls to JsonNull.

import dev.mccue.json.*;

import java.util.List;
import java.util.Map;

public class Main {
    public static void main(String[] args) {
        Json kermit = Json.of(Map.of(
                "name", Json.of("kermit"),
                "age", Json.of(22),
                "commitmentIssues", Json.of(true),
                "wife", Json.ofNull(),
                "children", Json.of(List.of(
                        JsonString.of("Tiny Tim")
                ))
        ));

        System.out.println(kermit);
    }
}

For JsonObject and JsonArray, there also use builders available which can make it so that you don't need to write Json.of on every value.

import dev.mccue.json.Json;

public class Main {
    public static void main(String[] args) {
        Json kermit = Json.objectBuilder()
                .put("name", "kermit")
                .put("age", 22)
                .putTrue("commitmentIssues")
                .putNull("wife")
                .put("children", Json.arrayBuilder()
                        .add("Tiny Tim"))
                .build();

        System.out.println(kermit);
    }
}

Writing

Once you have some Json you can write it out to a String using Json.writeString

import dev.mccue.json.Json;

public class Main {
    public static void main(String[] args) {
        Json songJson = Json.objectBuilder()
                .put("title", "Rainbow Connection")
                .put("year", 1979)
                .build();

        String song = Json.writeString(songJson);
        System.out.println(song);
    }
}
{"title":"Rainbow Connection","year":1979}

If output is meant to be consumed by humans then whitespace can be added using a customized instance of JsonWriteOptions.

import dev.mccue.json.Json;
import dev.mccue.json.JsonWriteOptions;

public class Main {
    public static void main(String[] args) {
        Json songJson = Json.objectBuilder()
                .put("title", "Rainbow Connection")
                .put("year", 1979)
                .build();

        String song = Json.writeString(
                songJson,
                new JsonWriteOptions()
                        .withIndentation(4)
        );
        
        System.out.println(song);
    }
}
{
    "title": "Rainbow Connection",
    "year": 1979
}

If you want to write JSON to something other than a String, you need to obtain a Writer and use Json.write.

import dev.mccue.json.Json;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;

public class Main {
    public static void main(String[] args) throws IOException {
        Json songJson = Json.objectBuilder()
                .put("title", "Rainbow Connection")
                .put("year", 1979)
                .build();


        try (var fileWriter = Files.newBufferedWriter(
                Path.of("song.json"))
        ) {
            Json.write(songJson, fileWriter);
        }
    }
}

Encoding

To turn a class you have defined into JSON, you just need to make a method which creates an instance of Json from the data stored in your class.

import dev.mccue.json.Json;

record Muppet(String name) {
    Json toJson() {
        return Json.objectBuilder()
                .put("name", name)
                .build();
    }
}

public class Main {
    public static void main(String[] args) {
        var beaker = new Muppet("beaker");
        Json beakerJson = beaker.toJson();

        System.out.println(Json.writeString(beakerJson));
    }
}

This process is "encoding." You "encode" your data into JSON and then "write" that JSON to some output.

For classes that you did not define, the logic for the conversion just needs to live somewhere. Dealer's choice where, but static methods are generally a good call.

import dev.mccue.json.Json;

import java.time.Month;
import java.time.MonthDay;
import java.time.format.DateTimeFormatter;

final class TimeEncoders {
    private TimeEncoders() {}

    static Json monthDayToJson(MonthDay monthDay) {
        return Json.of(
                DateTimeFormatter.ofPattern("MM-dd")
                        .format(monthDay)
        );
    }
}

record Muppet(String name, MonthDay birthday) {
    Json toJson() {
        return Json.objectBuilder()
                .put("name", name)
                .put(
                        "birthday", 
                        TimeEncoders.monthDayToJson(birthday)
                )
                .build();
    }
}

public class Main {
    public static void main(String[] args) {
        var elmo = new Muppet(
                "Elmo",
                MonthDay.of(Month.FEBRUARY, 3)
        );
        Json elmoJson = elmo.toJson();

        System.out.println(Json.writeString(elmoJson));
    }
}
{"name":"Elmo","birthday":"02-03"}

If a class you define has a JSON representation that could be considered "canonical", the interface JsonEncodable can be implemented. This will let you pass an instance of the class directly to Json.writeString or Json.write.

import dev.mccue.json.Json;
import dev.mccue.json.JsonEncodable;

record Muppet(String name, boolean great)
        implements JsonEncodable {
    @Override
    public Json toJson() {
        return Json.objectBuilder()
                .put("name", name)
                .put("great", great)
                .build();
    }
}

public class Main {
    public static void main(String[] args) {
        var gonzo = new Muppet("Gonzo", true);
        System.out.println(Json.writeString(gonzo));
    }
}

Reading

The inverse of writing JSON is reading it.

If you have some JSON stored in a String you can read it into Json using Json.readString.

import dev.mccue.json.Json;

public class Main {
    public static void main(String[] args) {
        Json movie = Json.readString("""
                {
                    "title": "Treasure Island",
                    "cast": [
                        {
                            "name": "Kermit",
                            "role": "The Captain",
                            "muppet": true
                        },
                        {
                            "name": "Tim Curry",
                            "role": "Long John Silver",
                            "muppet": false
                        }
                    ]
                
                }
                """);

        System.out.println(movie);
    }
}

If that JSON is coming from another source, you need to obtain a Reader and use Json.read.

import dev.mccue.json.Json;

import java.io.IOException;
import java.io.Reader;
import java.nio.file.Files;
import java.nio.file.Path;

public class Main {
    public static void main(String[] args) throws IOException {
        // If you were following along, we created this earlier!
        Json song;
        try (Reader fileReader = Files.newBufferedReader(
                Path.of("song.json"))
        ) {
            song = Json.read(fileReader);
        }

        System.out.println(song);
    }
}

If the JSON you provide is malformed in some way, a JsonReadException will be thrown.

import dev.mccue.json.Json;

public class Main {
    public static void main(String[] args) {
        // Should be in quotes
        Json.readString("fozzie");
    }
}
Exception in thread "main" dev.mccue.json.JsonReadException: JSON error (unexpected character): f
    at dev.mccue.json.JsonReadException.unexpectedCharacter(JsonReadException.java:33)
    at dev.mccue.json.internal.JsonReaderMethods.readStream(JsonReaderMethods.java:525)
    at dev.mccue.json.internal.JsonReaderMethods.read(JsonReaderMethods.java:533)
    at dev.mccue.json.internal.JsonReaderMethods.readFullyConsume(JsonReaderMethods.java:543)
    at dev.mccue.json.Json.readString(Json.java:369)
    at dev.mccue.json.Json.readString(Json.java:364)
    at dev.mccue.example.Main.main(Main.java:9)

Decoding

Up to this point, everything has been more or less the same as it is for other "tree-based" JSON libraries like org.json or json-simple.

This is where that will start to change.

To take some Json and turn it into a user defined class, a basic approach would be to use instanceof checks to see if the Json is a particular subtype and navigate from there.

import dev.mccue.json.*;

record Muppet(String name, boolean canSpeak) {
    static Muppet fromJson(Json json) {
        if (json instanceof JsonObject object &&
            object.get("name") instanceof JsonString name &&
            object.get("canSpeak") instanceof JsonBoolean canSpeak) {
            return new Muppet(name.toString(), canSpeak.value());
        }
        else {
            throw new RuntimeException("Invalid Muppet");
        }
    }
}

public class Main {
    public static void main(String[] args) {
        var json = Json.readString("""
                {
                    "name": "animal",
                    "canSpeak": false
                }
                """);

        var animal = Muppet.fromJson(json);

        System.out.println(animal);
    }
}

This process is "decoding." You "read" your data into JSON and then "decode" it to some type you define.

The problem with the instanceof approach is that you will end up with bad error messages on unexpected data. In this case the error message would just be "Invalid Muppet". The code to get better errors is tedious to write and I haven't seen many folks in the wild do it.

To get good errors, you should use the static methods defined in JsonDecoder.

package dev.mccue.example;

import dev.mccue.json.*;

record Muppet(String name, boolean canSpeak) {
    static Muppet fromJson(Json json) {
        return new Muppet(
                JsonDecoder.field(
                        json,
                        "name", 
                        JsonDecoder::string
                ),
                JsonDecoder.field(
                        json, 
                        "canSpeak", 
                        JsonDecoder::boolean_
                )
        );
    }
}

public class Main {
    public static void main(String[] args) {
        var json = Json.readString("""
                {
                    "name": "animal",
                    "canSpeak": false
                }
                """);

        var animal = Muppet.fromJson(json);

        System.out.println(animal);
    }
}

These handle the fiddly process of checking whether the JSON matches the structure you expect and throwing an appropriate error.

You should read this declaration as "at the field name I expect a string."

JsonDecoder.field(json, "name", JsonDecoder::string)

If the JSON is not an object, or doesn't have a value for name, or that value is not a string, you will get a JsonDecodeException.

public class Main {
    public static void main(String[] args) {
        var json = Json.readString("""
                {
                    "canSpeak": false
                }
                """);

        var animal = JsonDecoder.field(
                json, 
                "name", 
                JsonDecoder::string
        );

        System.out.println(animal);
    }
}

Which will have a message indicating exactly what went wrong and where.

Problem with the value at json.name:

    {
        "canSpeak": false
    }

no value for field

The last argument to JsonDecoder.field is the JsonDecoder you want to use to interpret the value at that field. In this case a method reference to JsonDecoder.string, which is a method that asserts JSON is a string and throws if it isn't.

For the methods which take more than one argument, there are overloads which can be used to get an instance of JsonDecoder.

// This will actually decode the json into a list of strings
List<String> items = JsonDecoder.array(json, JsonDecoder::string);

// This will just return a decoder
Decoder<List<String>> decoder = 
        JsonDecoder.array(JsonDecoder::string);

This, in conjunction with JsonDecoder.field is how you are intended to explore nested paths.

public class Main {
    public static void main(String[] args) {
        var json = Json.readString("""
                {
                    "villains": ["constantine", "doc hopper"]
                }
                """);

        List<String> villains = JsonDecoder.field(
                json,
                "villains",
                JsonDecoder.array(JsonDecoder::string)
        );

        System.out.println(villains);
    }
}

To decode JSON into your custom classes, you should add either a constructor or a static factory method which takes in Json and use these decoders to make your objects.

import dev.mccue.json.*;

import java.util.List;

record Actor(String name, String role, boolean muppet) {
    static Actor fromJson(Json json) {
        return new Actor(
                JsonDecoder.field(json, "name", JsonDecoder::string),
                JsonDecoder.field(json, "role", JsonDecoder::string),
                JsonDecoder.optionalField(
                        json, 
                        "muppet",
                        JsonDecoder::boolean_,
                        true
                )
        );
    }
}


record Movie(String title, List<Actor> cast) {
    static Movie fromJson(Json json) {
        return new Movie(
                JsonDecoder.field(json, "title", JsonDecoder::string),
                JsonDecoder.field(
                        json, 
                        "cast", 
                        JsonDecoder.array(Actor::fromJson)
                )
        );
    }
}

public class Main {
    public static void main(String[] args) {
        var json = Json.readString("""
                 {
                     "title": "Treasure Island",
                     "cast": [
                         {
                             "name": "Kermit",
                             "role": "The Captain"
                         },
                         {
                             "name": "Tim Curry",
                             "role": "Long John Silver",
                             "muppet": false
                         }
                     ]
                 }
                 """);

        var movie = Movie.fromJson(json);

        System.out.println(movie);
    }
}

Full Round-Trip

With all of that out of the way, here is how you might define a model, write it to json, and read it back in.

import dev.mccue.json.*;

import java.util.List;

record Actor(String name, String role, boolean muppet)
    implements JsonEncodable {
    static Actor fromJson(Json json) {
        return new Actor(
                JsonDecoder.field(json, "name", JsonDecoder::string),
                JsonDecoder.field(json, "role", JsonDecoder::string),
                JsonDecoder.optionalField(
                        json,
                        "muppet",
                        JsonDecoder::boolean_,
                        true)
        );
    }

    @Override
    public Json toJson() {
        return Json.objectBuilder()
                .put("name", name)
                .put("role", role)
                .put("muppet", muppet)
                .build();
    }
}


record Movie(String title, List<Actor> cast)
    implements JsonEncodable {
    static Movie fromJson(Json json) {
        return new Movie(
                JsonDecoder.field(json, "title", JsonDecoder::string),
                JsonDecoder.field(
                        json, 
                        "cast", 
                        JsonDecoder.array(Actor::fromJson)
                )
        );
    }

    @Override
    public Json toJson() {
        return Json.objectBuilder()
                .put("title", title)
                .put("cast", cast)
                .build();
    }
}

public class Main {
    public static void main(String[] args) {
        var json = Json.readString("""
                 {
                     "title": "Treasure Island",
                     "cast": [
                         {
                             "name": "Kermit",
                             "role": "The Captain",
                             "muppet": true
                         },
                         {
                             "name": "Tim Curry",
                             "role": "Long John Silver",
                             "muppet": false
                         }
                     ]
                 }
                 """);

        var movie = Movie.fromJson(json);

        var roundTrippedJson = Json.readString(
                Json.writeString(movie.toJson())
        );
        var roundTrippedMovie = Movie.fromJson(roundTrippedJson);

        System.out.println(
                json.equals(roundTrippedJson)
        );

        System.out.println(
                movie.equals(roundTrippedMovie)
        );
    }
}

Pitches

My hope is that at this point you have a sense of how it might look to use this library for your projects.

The rest of the post will just be some pitches to try to push you into the dark side.

It is not magic

Some people are perfectly fine with jackson-databind, gson, and other frameworks which use a class as a schema to read in JSON.

Others seem not to be. Kvetching about annotations and frameworks that make use of annotations is a common past-time in the community.

The current options for decoding without databind kinda suck though. To highlight this - I was talking with someone who takes the "magic bad" position. They said that generally they just use gson and manually construct their objects.

I challenged them to interpret this JSON into classes using their usual method.

{
    "title": "Treasure Island",
    "cast": [
        {
            "name": "kermit"
        },
        {
            "name": "gonzo"
        },
        {
            "name": "rizzo"
        }
    ]
}

And the following is the code they came up with.

package example.gson;

import com.google.gson.JsonObject;

public record Muppet(String name) {

    public static Muppet createFrom(JsonObject muppetObject) {
        String name = muppetObject
            .get("name")
            .getAsString();
        return new Muppet(name);
    }

}
package example.gson;

import java.util.ArrayList;
import java.util.List;

import com.google.gson.JsonArray;
import com.google.gson.JsonObject;

public record Movie(String title, List<Muppet> cast) {

    public static Movie createFrom(JsonObject object) {
        String muppetTitle = object
            .get("title")
            .getAsString();
        
        List<Muppet> cast = new ArrayList<>();
        JsonArray castArray = object.getAsJsonArray("cast");
        for (int i = 0; i < castArray.size(); i++) {
            JsonObject muppetObject = castArray
                .get(i)
                .getAsJsonObject();
            cast.add(Muppet.createFrom(muppetObject));
        }
        
        return new Movie(muppetTitle, cast);
    }
}
package example.gson;

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;

public class Example {

    public static void main(String[] args) {
        String content = "{\r\n"
                + " \"title\": \"Treasure Island\",\r\n"
                + " \"cast\": [\r\n"
                + "     {\r\n"
                + "         \"name\": \"kermit\"\r\n"
                + "     },\r\n"
                + "     {\r\n"
                + "         \"name\": \"gonzo\"\r\n"
                + "     },\r\n"
                + "     {\r\n"
                + "         \"name\": \"rizzo\"\r\n"
                + "     }\r\n"
                + " ]\r\n"
                + "}";
        
        JsonObject json = JsonParser
            .parseString(content)
            .getAsJsonObject();
        
        Movie movie = Movie.createFrom(json);
        
        System.out.println(movie);
    }
}

I think this code is pretty representative of the variety one would produce when working against this sort of API.

The follow-up challenge I gave him was to run this code against some malformed input.

{
  "title": "Treasure Island",
  "cast": [
    {
    },
    {
      "name": "gonzo"
    },
    {
      "name": "rizzo"
    }
  ]
}

The error message that his code produced was the following.

Cannot invoke "com.google.gson.JsonElement.getAsString()" because the return value of "com.google.gson.JsonObject.get(String)" is null

Which, while better than it would have been in years past (thanks JEP 358), still isn't amazing.

Compare that to the error message someone will get with what is the most natural way to express this with my library.

record Muppet(String name) {
    static Muppet fromJson(Json json) {
        return new Muppet(
                JsonDecoder.field(json, "name", JsonDecoder::string)
        );
    }
}

record Movie(String title, List<Muppet> cast) {
    static Movie fromJson(Json json) {
        return new Movie(
                JsonDecoder.field(
                        json, 
                        "title", 
                        JsonDecoder::string
                ),
                JsonDecoder.field(
                        json, 
                        "cast", 
                        JsonDecoder.array(Muppet::fromJson)
                )
        );
    }
}
Problem with the value at json.cast[0].name

    {}
    
no value for field

The code they produced is also pretty heavily "imperative." To make their list of Muppets they have a plain for loop and transform every element individually.

List<Muppet> cast = new ArrayList<>();
JsonArray castArray = object.getAsJsonArray("cast");
for (int i = 0; i < castArray.size(); i++) {
    JsonObject muppetObject = castArray
        .get(i)
        .getAsJsonObject();
    cast.add(Muppet.createFrom(muppetObject));
}

This is not intrinsically bad by any means, for loops are not evil, but all code lives somewhere on a spectrum from "declarative" to "imperative". Describing what should be done versus how it should be done.

If you compare their code to what one would write when relying on gson's reflection mechanisms the difference is stark.

record Muppet(String name) {}

record Movie(String title, List<Muppet> cast) {}

Yes, you need to know the rules for how JSON is automatically mapped to these structures and what different annotations mean if they are present. But this is unquestionably a "declarative schema." If you know the rules it is easier to read.

The code you would write with my library occupies a middle ground.

record Muppet(String name) {
    static Muppet fromJson(Json json) {
        return new Muppet(
                JsonDecoder.field(json, "name", JsonDecoder::string)
        );
    }
}

record Movie(String title, List<Muppet> cast) {
    static Movie fromJson(Json json) {
        return new Movie(
                JsonDecoder.field(
                        json, 
                        "title", 
                        JsonDecoder::string
                ),
                JsonDecoder.field(
                        json, 
                        "cast", 
                        JsonDecoder.array(Muppet::fromJson)
                )
        );
    }
}

While there is more of it than when you rely on heuristics, it is still "reasonably declarative."

return new Movie(
        JsonDecoder.field(
                json, 
                "title", 
                JsonDecoder::string
        ),
        JsonDecoder.field(
                json, 
                "cast", 
                JsonDecoder.array(Muppet::fromJson)
        )
);

The logic for it is both extensible (there is nothing privileged about JsonDecoder; you can write your own helper methods) and defined in code that you can click-to-definition to.

If a field can be null you would see nullableField. If a field can be missing, you would see optionalField. If it could be both, you would see optionalNullableField.

This is simple to teach

I don't know about you, but I am absolutely sick of explaining Jackson to students who are still struggling with classes in general.

If a student has JSON like this

[ {"name": "kailee"}, {"name": "fran"} ]

Then to read it in as a List<Person> they need to either

  • target a Person[]
  • target a class which extends ArrayList
  • provide a TypeToken<List>

Plus maybe some other options I might be unaware of.

To actually understand what they are doing for just that, they need to have a sense for some combination of

  • Inheritance
  • Generic Erasure
  • Reflection

And that is really hard to impart at their stage, so often us online helpers just say "ah, make a class that looks like this." and send them on their way.

On the flip-side, when beginners get frustrated with a databind approach and fall back to something like org.json they seem to produce some absolute monstrosities before they come back with another question.

It's not their fault, they learned loops at most a semester ago, but it does present some practical problems.

The tension is between giving an option that there is enough time to teach the mechanics of and teaching an approach that will be ergonomic enough for them to complete their assignments.

I've been testing early drafts of this library against real students and, while there are too many confounding variables to say I've done any real science, I've found it to be far easier.

When a student needs a quick monkey-see-monkey-do, the JsonDecoder.field pattern seems to work just fine. When a student wants, needs, or has time for a full explanation there is a far shorter distance between where they are and where I need to get them.

I just need to make sure they understand interfaces and lambdas then they are ready for some version of the tutorial I gave in the first section.

Students in college aren't the only people who need to be taught how to work with JSON in Java either. If you work for a company that hires juniors or folks who come from different language backgrounds, then there has to be an education step.

It might be worth the boilerplate of writing out fromJson and toJson to have a codebase where onboarding doesn't need to touch the "advanced" side of Java to send JSON over the wire.

It could help Java get an official JSON library

There has been a JEP open for years which proposes adding JSON to the standard library

If you use a build tool like maven for all of your programs, it might not seem important. You can pull in Jackson, gson, org.json, this, or any library with one declaration.

There are a few things which make me care about it though

  1. Right now, the only data format built in the Java is XML. That's not exactly the king of language neutral formats it was in the 90s.
  2. Java now supports single file programs and will eventually support "terse" main methods. As the applicability for scripting goes up, the lack of the ability to use JSON hurts more since that is generally a "no-dependency" situation.
  3. Whatever is in the standard library has the power to affect defaults. Databind as the default for the ecosystem feels like it has too much momentum to change otherwise.
  4. When integrating Graal there are going to be components, like the reflection-config.json file it uses for native image, that will read and write JSON. I fear that will be too tempting a target for --add-opens.

Regardless of if you agree that support should be in the standard library, I think the previous section illustrates some of the problems that could come if one of the existing APIs were adopted.

List<Muppet> cast = new ArrayList<>();
JsonArray castArray = object.getAsJsonArray("cast");
for (int i = 0; i < castArray.size(); i++) {
    JsonObject muppetObject = castArray
        .get(i)
        .getAsJsonObject();
    cast.add(Muppet.createFrom(muppetObject));
}

Regardless of applicability, availability could end up making this the default. That is unideal.

An option to avoid this is for the standard library to add direct support for databind. Not only was that ruled out by the existing JEP, it would probably just be a bad idea. Mapping the JSON data model to the wide universe of Java objects has solutions that occupy a very large design space.

Considering the long term commitments the JDK makes whenever it adds a new feature as well as the mental, physical, and emotional damage dealt by its existing Serializable mechanism - I don't see that happening.

If the JDK gave up and just provided a low-level streaming parser akin to jackson-core, then it wouldn't affect the defaults in the ecosystem that much, but it would raise the question of "why not just use jackson-core." In addition, users would still have to add a library to accomplish most tasks. There wouldn't be much of a benefit.

So that's where this library comes in. It's nowhere near seaworthy for that ocean, but the JsonDecoder approach is relatively novel in the JVM ecosystem. The mechanisms needed to make it "work" have only been around since Java 8 and, as far as I know, haven't been tested on any large scale.

The more folks try it, or write libraries that do the concept "but better", or socialize it, etc. the more confidence there can be in whether a decoder based API would be applicable to the needs of the JDK.

You can see my recent conversation on the mailing list about this here.

You can play with new features

Maybe I haven't convinced you to give it a try for your work or personal projects. That's fine.

Still, it is a small codebase. It makes (I think) good use of the features added to Java in the last decade. If you aren't caught up it might be a good reference point to do so.

If you want to play with upcoming features like value classes or string templates it could be a nice playground to see how that would affect performance, design, or just how the code feels.

In particular, JSON is mentioned as a use-case in the JEPs and explanations for new features often. Could be nice to have a JSON api to point to that actually works in the way being described.


<- Index

Development Perils: How to not create a mobile application

by: Matthias Ngeo

Ever since Forus Labs' first mobile application, TimeBloc, was acquired in September 2020, I've mused about writing a short postmortem on its less-than-stellar development. Perhaps as a conclusion to the first chapter in our software engineering careers. I hesitated each time, unsure of how to concisely fit everything into an article. It's almost 2023. Enough time has passed that memories of that period are becoming hazy. I can't hesitate any longer.

Rare photograph of TimeBloc's development circa 2019

Those seeking groundbreaking insights into software engineering should stop reading. This article just describes the aftermath of ignoring practices beaten to death by others.

So gather around the fireplace, as I tell a tale of poor software engineering decisions; of how to not create a mobile application.

Choose Wisely

Our tale begins in early 2019. Three lads, fresh out of polytechnic (high-school equivalent), had an overabundance of time before embarking on their compulsory service. They assumed creating a mobile application to be entertaining and straightforward affair. However, none of them had any prior professional experience creating mobile applications. You can probably sense where this is heading.

During meetings at their local Starbucks, they pitched wild and fantastical features to include in their time-blocking application. One of those features was real-time synchronization of all the user's data, i.e. time-blocks and settings, across their devices. Debates on whether even that was too fantastical continued perpetually until a compromise was sought, deferring the feature to a subsequent release. Unknowingly, the three lads had steered the project away from certain doom.

Deferring the feature was one of the few mistakes avoided. It was only discovered to be fraught with difficulty after implementing a similar feature in a subsequent application.

In essence, it was a distributed computing problem. Users could concurrently modify and sync data across several of their devices. To further complicate matters, the application had to be offline-first. That is to say, the application must work even when unconnected to the internet. Modifications had to be reconciled and propagated as they arrived piecemeal. Think "Multi-Leader Replication on Steroids".

Had the three lads stubbornly insisted on real-time synchronization of data, TimeBloc would remain vaporware until today. A project's features is quite literally make-or-break. Moral of the story is, choose wisely what to implement, err on the side of caution and do not implement something if in doubt, KISS. Likewise, don't implement offline-first support and data synchronization together. It's difficult.

Ecological Survey

A few weeks passed in the blink of an eye. The three lads had finished bike-shedding the application's initial features. Said features remained tame, devoid of those too deemed outlandish. Before development commenced, one question still remained. Which language and framework do they use?

"Sea port at sunset" - Claude Lorrain, 1639

The three lads found themselves at a port seeking passage across a perilous, sprawling ocean. Once aboard, it was nigh impossible to switch ships mid-voyage. Moored close to shore were two colossal ships, native Android and iOS development, surrounded by flocks of passengers awaiting embarkment. Both ships were remarkably popular, their seaworthiness trialed-and-tested by time. Moored further down the pier was React Native. Despite having been built later, it had proven to be seaworthy and attracted a respectable crowd. Lastly, there was Flutter, a brand-new ship yet to sail its maiden voyage. It incorporated the latest advancements in shipbuilding and was surrounded by crowds on the dock. Nevertheless, few in those crowds were actual passengers.

Lacking the manpower and funding to develop two separate applications, native Android and iOS development were out of the picture. Both had a single, different destination. However, our funds afforded us passage to only one. Yet, we sought to visit both destinations. Thus, the only contenders were cross-platform frameworks like Flutter and React Native.

After brief experimentation and poring over documentation, Flutter was chosen. Unbeknown to us was the importance of conducting a thorough ecological survey. That is to say, smitten by the ship's advanced exterior, we forgot to check if the ship's interior was even furnished. Flutter in 2019 isn't Flutter in 2022. It was still in its infancy. Likewise, the community and open-source ecosystem surrounding the framework was still budding. It was only discovered partway through development that there was no support for Lottie animations.

Sample Lottie animation

Although Rive was supported, good luck convincing any freelance designer to create an animation in that format. Stuck between a rock and a hard place, the difficult decision was eventually made to scrap all animations.

Some other memorable issues included the notification scheduling library not accounting for Daylight saving time, and the SQLite Flutter library not supporting desktop environments. The latter meant unit tests depending on SQLite couldn't be ran outside an Android/iOS emulator. It greatly influenced the decision to skip unit tests covered in the next section.

Because of its recency, Flutter's community had yet to take root. This manifested as less publicly available information owning to the lack of grey-haired Gandalf-types that thrived on other platforms. Consequentially, that led to greater difficulty with debugging and troubleshooting problems.

One particularly nasty incident occurred after integrating background notification scheduling. In production, reports that the application crashed during start-up began coming in. Further examination revealed that it only affected iPhone 8 devices running a certain iOS version. To complicate matters, the issue could not be replicated on an emulator nor did we own an iPhone 8 running that iOS version. An entire weekend was spent frantically debugging the issue, scouring the internet for any hints to no avail. Desperate, the decision was made to remove background notification scheduling altogether in an emergency patch.

Developing any non-trivial piece of software will inevitably require features offered beyond the language or framework. It is often the surrounding community that provides those missing pieces. Reusing the ship analogy, embarking on a ship guarantees passage but not comfort. The moral of the story is, always conduct an ecological survey on the surrounding open-source ecosystem and community when deciding on a language/framework.

Test Now

Yet in another blink of an eye, a few months have passed. Our three lads found themselves wading knee-deep in development work. Things had progressed slower than anticipated while the looming deadline drew close. The metaphorical ship's pace had to be tightened. To lighten the ship, the three lads tossed the lifeboats overboard. They reasoned that the ship wasn't on fire, and the lifeboats could be retrieved if it did, or at the end of the voyage. Long story short, they didn't. The lifeboats remain lost at seas till today.

Skipping unit testing was controversial. Although we acknowledged it to be potentially disastrous, the motivations seemed rationale. Unit tests benefited maintenance in the long term. However, there wasn't going to be a long term if the application missed the initial deadline. Tests could always be added once things have stabilized. In the interim, manual testing should suffice. It couldn't be that bad.

In short, test later gradually became test never. Things could be that bad. Manual testing was time-consuming and unreliable in a constant development flux. That meant manual tests gradually subsided too, while developer confidence plummeted. Eventually, manual testing was only conducted when gluing the UI and business logic together.

The application was built using a pseudo-BLoC architecture composed of several layers. Each developer tackled a single layer in isolation. Contrary to the adage of "integrating often and early", integration only commenced once all layers were individually completed. It was neither often nor early.

Skipping tests and delaying integration was a potent combination. It halted progress and development manifested into the nine circles of hell. It was only discovered during integration that each layer behaved contrary to the other developers' expectations. Similar to the Tower of Babel, further examination revealed contrasting interpretations of each layer's supposed behaviour. To remedy the issue, several bootleg modifications were applied over the span of a day, further damaging the application's structure.

Rare photo of developer debugging TimeBloc, circa 2019

To worsen matters, every imaginable bug surfaced in swarms during manual testing. The application would spontaneously crash and data would become corrupted seemingly at random. Since each individual layer wasn't tested, identifying and isolating the root causes became miniature D&D campaigns. A bug could be caused by the UI, persistence layer, and everything in between. Speaking from personal experience, nothing is as soul-draining as reaching work at 10am and debugging bugs until 4am in the morning.

In the end, although the application barely met the looming deadline, the decision to forego unit testing turned the application into a "Haunted Graveyard" during its lifetime. Future development stalled. Features couldn't be added and existing bugs couldn't be fully stamped-out. Because of that, rewriting the application was under consideration shortly before the application was acquired.

We failed to acknowledge the immediate maintenance benefits of unit tests. The time spent performing manual testing surpassed the predicted time writing equivalent unit tests by a few folds. Notwithstanding the time spent debugging nor the toll on developers' morale. Similarly, integrating changes late increased the cost of debugging and modification substantially. This combination forced us to cut features and postpone our plans to implement monetization in the initial release. Shedding tests to quicken velocity is almost always counterproductive. Test now before it becomes test never. That goes both for writing unit tests and integrating early. See Chapter 11 of Software Engineering at Google for a more in-depth treatise of the subject.

Perils

Following the previous sections, leftover material still remains. None of which substantial enough to dedicate an entire section to. Listed below in no particular order, are perils encountered during development.

  • Bundled SQLite versions may be ancient. Ensure that all SQLite features used are supported on all target platforms.
  • Foreign keys aren't enabled by default in SQLite. Always enable foreign keys via PRAGMA_foreign_key = ON.
  • Offline-first & data synchronization aren't a simple combination. Be prepared for distributed computing problems.
  • It is trivial to mangle time zones. Be careful when using Dart's lackluster DateTime class. See Falsehoods programmers believe about time zones.
  • Don't publish new versions before the weekends/holidays. You might wind up spending that time debugging issues in production.
  • Document everything. Trying to understand undocumented spaghetti code you wrote 1 year ago is difficult.

Final Thoughts

Our first foray into the world of professional software engineering wasn't glamorous. Nevertheless, it still represented a significant step forward. Although plenty of lessons were learnt through blood, toil, tears and sweat, I'm glad to be able to sit here and laugh at our own foolish mistakes in hindsight. Likewise, I hope you had a chuckle at the sheer madness even if you didn't take away anything else.

TL;DR

  • Choose wisely what to implement, err on the side of caution and do not implement something if in doubt.
  • Always conduct an ecological survey on the surrounding open source space & community when deciding on a language/framework.
  • Test now before it becomes test never.

Article was originally published on Medium.


<- Index

How to Structure a Clojure Web App 101

by: Ethan McCue

At work, we use integrant to manage stateful components in our Clojure apps.

It has been fine, but it's a constant struggle to explain it.

From a purely mechanical perspective there is a lot to teach. It uses multimethods to register lifecycle hooks, idiomatic use demands namespaced keywords, and in testing we've needed to incorporate special libraries.

None of that is fundamentally a problem though. All the libraries which do this sort of thing use some weirder part of Clojure's arsenal. For component it is records and protocols. For clip it is namespaced symbols and dynamic lookup. For donut it's a secret, more complex third thing.

What has been a challenge is explaining what exactly it is that these libraries do. Doing that - really doing that - requires a mountain of shared context that folks simply do not have.

What would you say you... do here?

This article is an attempt to convey some of that shared context. Apologies if it gets a bit ranty.

Ring

This is an HTTP Request.

GET /echo HTTP/1.1
Host: mccue.dev
Accept: text/html

This is an HTTP Response.

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 19

{"success":"true"}

Basically the entire Clojure world has agreed to a specification called "ring" which says how these requests and responses translate to data structures in Clojure.

Clojure web servers are functions that take "ring requests" which look like the following

{:uri            "/echo"
 :request-method :get
 :headers        {}
 :body           ...
 :protocol       "HTTP/1.1"
 :remote-addr    "127.0.0.1"
 :server-port    80
 :content-length nil
 :query-string   nil
 :scheme         :http}

and produce "ring responses" which look like this.

{:status  200
 :headers {"Content-Type" "application/json"}
 :body    "{\"success\":\"true\"}"}

Everything else - routing, authentication, middleware - is built upon this foundation.

(ns example
   (:require [ring.adapter.jetty :as jetty]))

(defn handler 
  [request]
  (cond 
    (= (:uri request) "/hello")
    {:status 200 
     :body   "Hello, World"}
    
    :else
    {:status 404}))

(defn start-server 
  [] 
  (jetty/run-server handler {:port 1234}))

So this code, as written, will run a Jetty server which responds to all requests to /hello with Hello, World and all other requests with a 404.

The REPL

One issue that is already relevant with preceding example, and will be a common theme going forward, is "REPL Friendliness."

Clojure and other Lisps have the unique property that the "unit" of code isn't a file, but instead an individual "form."

As an example, with Python you cannot run the following code.

print("Start")
    
3di92d93209032

You will get a syntax error on the third line and nothing with run.

  File "/Users/emccue/Development/posts/example.py", line 3
    3di92d93209032
    ^
SyntaxError: invalid decimal literal

The equivalent Clojure looks like this.

(println "Start")

903f903jf939cn34f934fj9j39f4

Unlike with the Python example, the very first println will actually run before a crash.

Start
Syntax error reading source at (example.clj:4:0).
Invalid number: 903f903jf939cn34f934fj9j39f4

The reason for this is that the Clojure reader will evaluate each "form" one at a time. There is no full pass of the file before running code.

This enables a workflow where a developer has a file open in one window with the full contents of their code and another window open at the same time with their "live" program - the "REPL".

Through editor magic, a developer can then load new code one form at a time into the live program. If in doing so a function is redefined, then the new definition of the function will start to be used.

There are many other explanations for this mechanism and the workflow it enables online.

So with that context, what is "not REPL friendly" about the example server code?

(defn handler
   [request]
   (cond
      (= (:uri request) "/hello")
      {:status 200
       :body   "Hello, World"}

      :else
      {:status 404}))

Assuming that first we load the handler function, we will next load the start-server function.

(defn start-server
   []
   (jetty/run-server handler {:port  1234
                              :join? false}))

And some code will eventually call it to start the server

(start-server)

At this point, a developer might want to modify the handler function to respond to requests on the /marco route.

(defn handler
   [request]
   (cond
      (= (:uri request) "/hello")
      {:status 200
       :body   "Hello, World"}

      (= (:uri request) "/marco")
      {:status 200
       :body   "POLO!"}

      :else
      {:status 404}))

If they did this and tried making a request to /marco, the server would still respond with a 404.

The reason for this is that whenever start-server is called it will be passed the current "value" backing the handler function. Future updates won't be picked up unless the server is stopped and restarted.

This is pretty trivially side-steppable by using some "indirection" mechanisms.

(defn start-server
   []
   (jetty/run-server #'handler {:port  1234
                                :join? false}))

In this case, putting the #' in front of handler makes it so that whenever it is called the current value of the handler function will be used. If a developer were to re-load a new definition of handler into the REPL it would be immediately picked up and used.

This is what REPL friendly code looks like. It makes it easier for a developer to have changes picked up on the fly in a running program and rapidly experiment with new things.

There are other associated techniques like leaving a comment at the bottom of a file with code only intended to be used with the REPL.

(ns example
   (:require [ring.adapter.jetty :as jetty]))

(defn handler 
  [request]
  ...)

(defn start-server 
  [] 
  ...)

;; The Server will not start automatically, but a dev
;; can conveniently start it by putting their cursor in
;; the comment and loading the call into the repl
(comment
   (start-server))

Global Stateful Resources

Of course, most web apps are not written entirely in a single function. The most natural point at which to split out logic tends to be at handlers for different paths.

(ns example
   (:require [ring.adapter.jetty :as jetty]))

(defn hello-handler 
   [request]
   {:status 200
    :body   "Hello, World"})

(defn marco-handler
   [request]
   {:status 200
    :body   "POLO!"})

(defn handler
   [request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler request)

      (= (:uri request) "/marco")
      (marco-handler request)

      :else
      {:status 404}))

(defn start-server
   []
   (jetty/run-server #'handler {:port  1234
                                :join? false}))

(comment
   (start-server))

And of course the actual declarations of the routes can be separated from the code that starts the server, but that would get hard to follow here.

At this point most of the code is fairly easy to test. You just make fake requests, pass them to the handlers, and check that the responses are what you expect.

(ns example-test
   (:require [clojure.test :as t]
             [example]))
   
(t/deftest handler-test
  (t/testing "Request to /hello gets Hello, World"
     (let [response (example/handler {:uri "/hello"})]
        (t/is (= (:status response) 200))
        (t/is (= (:body response) "Hello, World"))))
        
  (t/testing "Request to /marco gets POLO!"
     (let [response (example/handler {:uri "/marco"})]
        (t/is (= (:status response) 200))
        (t/is (= (:body response) "POLO!"))))
        
  (t/testing "Request to unknown path gets 404"
     (let [response (example/handler {:uri "/jdkdawdoaddwadad"})]
        (t/is (= (:status response) 404)))))

This is a cool property of the overall ring model. You can directly test handlers without having to actually spin up a server.

No real programs can ever stay easy to test pure functions. Handling a request often implies the need for dependence on some "stateful resources" such as external services and connection pools.

External Services

As an example, lets say when you make a request to /marco we still want to respond with POLO!, but if the user specifies that they are not in a pool with a query string /marco?nopool then we want to respond with the entire Wikipedia page for Marco Polo.

(defn marco-handler
   [request]
   (if (= (:query-string request) "nopool")
      {:status 200 
       :body   (slurp "https://en.wikipedia.org/wiki/Marco_Polo")}
      {:status 200
       :body   "POLO!"}))

While we can still test this conveniently, the test will have an implicit dependence on Wikipedia being online. It also makes our tests slower than they need to be since we are making an actual http call.

(ns example-test
   (:require [clojure.string :as string]
             [clojure.test :as t]
             [example]))
             
(t/deftest marco-handler-test        
  (t/testing "Request to /marco gets POLO!"
     (let [response (example/marco-handler {:uri "/marco"})]
        (t/is (= (:status response) 200))
        (t/is (= (:body response) "POLO!"))))
        
  (t/testing "Request to /marco with no pool gets info"
     (let [response (example/marco-handler {:uri          "/marco"
                                            :query-string "nopool"})]
        (t/is (= (:status response) 200))
        (t/is (string/includes?
                (:body response) 
                "The Travels of Marco Polo")))))

This isn't ideal, but it could be worse. Imagine if you wanted to alert an admin every time the /hello route was called. A bit of a silly example, but calls to APIs like Sendgrid aren't unreasonable to do in response to some requests.

(defn hello-handler
   [request]
   (sendgrid/send-email "admin@website.com" "You got a user!")
   {:status 200
    :body   "Hello, World"})

As written, this is a doozy to test. Either you

  • Make sure you only have test credentials loaded when running your unit tests. Don't mess it up!
  • Make sure you have no credentials loaded when running your unit tests. You now need to be extra cautious that you are okay with calls being made that will always fail.
  • Stub out the functions that call out to external services with a mechanism like with-redefs.

The problem with the last solution, even though it does mechanically solve the issue, is that you need to know what external services a piece of code will use. Since our handlers are just taking a request, there is not enough information at call-sites or in the function header to say for sure.

(defn hello-handler
   [request]
   ;; Have to read every function this calls
   ;; to see what stateful stuff is going on...
   (some-other-code request))

So tests end up looking like the following, with pretty low confidence that everything has been stubbed out.

(with-redefs [sendgrid/send-email (constantly nil)]
   (t/testing ... ACTUAL TEST ...))

Connection Pools

Handlers also very often need to talk to a database. It is wasteful to make a new database connection on every request, so a really common technique is to keep a certain number of connections alive in a "pool" and re-use them over and over again.

What is common, and saddening, to find is a connection pool stored in a top-level constant and referenced by a large part of the codebase.

(ns example.db
   (:import (com.zaxxer.hikari
              HikariConfig
              HikariDataSource)))

(def pool (HikariDataSource. 
            (doto (HikariConfig.)
              (.setJdbcUrl "..."))))
(defn hello-handler
   [request]
   ;; Information like this can come from middleware.
   (let [user-id   (:user-id request)
         user-name (jdbc/execute-one! 
                      db/pool 
                      ["SELECT name FROM user 
                        WHERE user.user_id = ?"
                       user-id])]
     {:status 200
      :body   (str "Hello, " user-name)}))

Even assuming that, like DHH, you are fine with your tests hitting a real database this still creates some practical problems.

For one, if you edit the file where the connection is defined you might accidentally reload the constant and leak a bunch of connections. This isn't the most likely on a large project where you aren't touching this code that often, but over the course of a long lived REPL session it can be an annoying.

But also it is annoying logistically that the connection pool is established immediately when the code is loaded. If you Ahead-of-Time compile your Clojure code then you will pretty immediately want that to not be the case.

You can sidestep that last issue by putting the connection pool behind a "delay", which lazily starts the connection pool when it is needed.

(ns example.db)

(def pool (delay 
             (HikariDataSource. 
               (doto (HikariConfig.)
                 (.setJdbcUrl "...")))))

But now this detail changes how users have to access the actual pool. Usage sites have to add an @ to make sure the pool has been started and to retrieve it.

(defn hello-handler
   [request]
   (let [user-id   (:user-id request)
         user-name (jdbc/execute-one! 
                      @db/pool 
                      ["SELECT name FROM user 
                        WHERE user.user_id = ?"
                       user-id])]
     {:status 200
      :body   (str "Hello, " user-name)}))

Annoying, but that's not all. if you want to sub out the pool in a test fixture and maybe run tests in parallel then the whole pool needs to be dynamically re-bindable as well.

(def ^:dynamic 
   *pool* 
   (delay 
      (HikariDataSource. 
         (doto (HikariConfig.)
            (.setJdbcUrl "...")))))
(defn hello-handler
   [request]
   (let [user-id   (:user-id request)
         user-name (jdbc/execute-one! 
                      @db/*pool*
                      ["SELECT name FROM user 
                        WHERE user.user_id = ?"
                       user-id])]
     {:status 200
      :body   (str "Hello, " user-name)}))
(binding [db/*pool* (delay (make-test-pool))]
   (insert-user 123 "bob")
   (let [response (hello-handler {:user-id 123})]
      (t/is (= (:body response)
               "Hello, bob"))))

All of that is workable - you can use macros and helper functions to alleviate the syntax ugliness and generally speaking your app will just have one database.

But it also is not that uncommon for an app to have two databases. Usually one SQL and one Redis-like. And while it's not as hard as for arbitrary external services - you still don't really know from a call-site whether you need to establish a test database before calling it in a test.

Inversion of Control

The general shape of the solution to those problems is to not have "global" stateful resources.

For external services, this means making an actual object to pass as the first argument to calls.

If the service is like Sendgrid, this could be a convenient place to put information like your API key or make a persistent http client.

(defn make-sendgrid-client 
   [api-key]
   {:api-key api-key
    :client  (hato/build-http-client {:connect-timeout 10000
                                      :redirect-policy :always})})

(defn send-email 
   [sendgrid-client]
   (hato/post (:client sendgrid-client) "/send-email"))

But even if the service is "stupid" and requires no authentication or special treatment like Wikipedia, there is still value.

(defn make-wikipedia-client
   []
   ;; Nothing really to put...
   {:name "Wikipedia Client"})

(defn get-marco-polo-info
  [wikipedia-client]
  (slurp "https://en.wikipedia.org/wiki/Marco_Polo"))

The value being in the fact that having something as a first argument means that later on you have the ability to refactor calls to be behind some dispatch mechanism like a protocol.

(defprotocol WikipediaClient
   (get-marco-polo-info [_]))

(defn make-wikipedia-client
   []
   (reify WikipediaClient
      (get-marco-polo-info [_]
         (slurp "https://en.wikipedia.org/wiki/Marco_Polo"))))

Which in turn can enable creating "fake" implementations for testing.

(def fake-wikipedia
   (reify WikipediaClient
      (get-marco-polo-info [_]
         "was a dude, i guess?")))

For connection pools, there is already an actual object to pass so that isn't an issue. The same "maybe make it a protocol later" strategy is applicable to that sort of resource as well.

Then in all the code that wants these dependencies, just expect them to be given as arguments.

(defn marco-handler
   [wikipedia-client request]
   (if (= (:query-string request) "nopool")
      {:status 200 
       :body   (wikipedia/get-marco-polo-info wikipedia-client)}
      {:status 200
       :body   "POLO!"}))

Which provides a clear path to sensible testing.

(ns example-test
   (:require [clojure.string :as string]
             [clojure.test :as t]
             [example]))
             
(t/deftest marco-handler-test 
   (let [mock-wikipedia (reify WikipediaClient
                           (get-marco-polo-info [_]
                              "INFO"))]       
     (t/testing "Request to /marco gets POLO!"
        (let [response (example/marco-handler 
                         mock-wikipedia
                         {:uri "/marco"})]
           (t/is (= (:status response) 200))
           (t/is (= (:body response) "POLO!"))))
           
     (t/testing "Request to /marco with no pool gets info"
        (let [response (example/marco-handler 
                        mock-wikipedia
                        {:uri          "/marco"
                         :query-string "nopool"})]
           (t/is (= (:status response) 200))
           (t/is (= (:body response) "INFO")))))

This technique - where we get dependencies as arguments instead of making them locally or getting them from some global place - is commonly called "Inversion of Control."

Dependency Injection and "The System"

While this is a concrete improvement - we can directly see what the dependencies of a process are in the argument list - there are still some unresolved issues.

Let's say our hello-handler wants to use a sendgrid-service and the database pool and our marco-handler wants to use a wikipedia-service and the database pool.

(defn hello-handler
   [sendgrid-service pool request]
   ...)

(defn marco-handler
   [wikipedia-service pool request]
   ...)

This implies that the root handler function will have access to all of these things and pass them down as needed.

(defn handler
   [sendgrid-service wikipedia-service pool request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler sendgrid-service pool request)

      (= (:uri request) "/marco")
      (marco-handler wikipedia-service pool request)

      :else
      {:status 404}))

With just three stateful components and two handlers this is manageable, but beyond three arguments using positional arguments is overly burdensome and error-prone.

(defn handler
   [sendgrid-service 
    wikipedia-service 
    pool
    some-service
    other-thing
    oh-no
    request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler sendgrid-service pool request)

      (= (:uri request) "/marco")
      (marco-handler wikipedia-service pool request)

      (= (:uri request) "/thing")
      (some-handler some-service sendgrid-service request)

      (= (:uri request) "/thing2")
      (some-handler some-service 
                    oh-no 
                    sendgrid-service 
                    other-thing  
                    request)
      
      (= (:uri request) "/thing3")
      (some-handler some-service 
                    oh-no
                    other-thing  
                    request)
      
      ;; ... * 100
      
      :else
      {:status 404}))

The solution is to put all stateful components into a single map, popularly called the "system."

{:sendgrid-service  sendgrid-service
 :wikipedia-service wikipedia-service
 :pool              pool}

Then the handler just threads down this one map to all the entry-points

(defn handler
   [system request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler system request)

      (= (:uri request) "/marco")
      (marco-handler system request)

      :else
      {:status 404}))

and individual handlers "declare" which of these components they are interested in by only pulling those keys out of the map.

(defn hello-handler
   [{:keys [sendgrid-service pool]} request]
   ...)

(defn marco-handler
   [{:keys [wikipedia-service pool]} request]
   ...)

This way it is still declared up front what stateful components some bit of code needs to do its work, but the "wiring" code for each entry-point can stay uniform.

This technique, where all a piece of code needs to do to get access to a resource is "declare" that they want it is usually called "Dependency Injection."

Important to note also that after this "entry-point" code should generally pass down things explicitly. Passing the whole system is a hand-gun pointed at a foot-foot.

(defn marco-handler
   [{:keys [wikipedia-service pool]
     :as system} request]
   ...
   ;; Back to not knowing what this could be doing deep down...
   (some-code system)
   ...)

Starting and Stopping the System

There needs to be some code that actually starts up all the components of the system.

(defn start-system 
  []
  (let [config            (load-config)
        sendgrid-service  (make-sendgrid-service config)
        wikipedia-service (make-wikipedia-service)
        pool              (make-pool config)]
     {:config            config
      :sendgrid-service  sendgrid-service
      :wikipedia-service wikipedia-service
      :pool              pool}))

Some stateful bits might depend on other stateful bits to get started. In the above example the hypothetical Sendgrid service and database connection pool depend on some config object which is loaded earlier.

Clearest example of that is the server instance itself. If it is to be put into the system, then it will need all the things started before it.

(defn start-system 
  []
  (let [config            (load-config)
        sendgrid-service  (make-sendgrid-service config)
        wikipedia-service (make-wikipedia-service)
        pool              (make-pool config)
        system-so-far     {:config            config
                           :sendgrid-service  sendgrid-service
                           :wikipedia-service wikipedia-service
                           :pool              pool}
        server            (start-server system-so-far)]
     (assoc system-so-far :server server)))
(defn hello-handler
   [{:keys [sendgrid-service pool]} request]
   ...)

(defn marco-handler
   [{:keys [wikipedia-service pool]} request]
   ...)

(defn handler
   [system request]
   (cond
      (= (:uri request) "/hello")
      (hello-handler system request)

      (= (:uri request) "/marco")
      (marco-handler system request)

      :else
      {:status 404}))

(defn start-server
   [system]
   (jetty/run-server 
      (partial #'handler system) 
      {:port  1234
       :join? false}))

The reason you would want the server to be part of the system ties back to the REPL workflow. If you change or add some stateful component you might want to stop an old running system and start up a new one. The running http server is likely to be one of these things you would want to restart.

To properly do this, every stateful resource which might have shutdown logic needs to provide a function which shuts it down.

(defn stop-server 
   [server]
   (.stop server))

And then some larger function needs to be able to stop each component of the system, doing so in the reverse order they were started ideally.

(defn stop-system 
   [system]
   (stop-server (:server system))
   (stop-connection-pool (:pool system))
   ;; In this hypothetical the sendgrid service
   ;; has shutdown logic, but the wikipedia service does not.
   (stop-sendgrid-service (:sendgrid-service system)))

Then to facilitate working with the "current system" in the REPL it does need to be bound to some global value.

(ns example.repl
   (:require [example.system :as system]))

(def system nil)

(defn start-system!
   []
   (alter-var-root #'system (constantly (system/start-system))))

(defn stop-system!
   []
   (system/stop-system system)
   (alter-var-root #'system (constantly nil)))

(comment
   (start-system!)

   (stop-system!))

A developer can then reference example.repl/system in their REPL session to see the currently running system and pull out values to test calls to functions they are playing with.

(some-db-function 
   (:pool example.repl/system) 
   123 
   "abc")

And while this does give birth to a global stateful thing, the problems of that are fairly mitigated.

For one, it can reasonably exist only in development. In the code above there is a distinct namespace just for giving a start-system! and stop-system! to be used in development. On the tooling side you can even make sure this file isn't included in production builds with something like deps.edn aliases.

;; Assuming example/repl.clj is under dev-src
{:paths ["src"]
 :aliases {:dev {:paths ["dev-src"]}}}

So what is integrant for?

As I mentioned before, you need to start all of your stateful components in the right order and stop them all in the reverse of that order.

(defn start-system
   []
   (let [config            (load-config)
         sendgrid-service  (make-sendgrid-service config)
         wikipedia-service (make-wikipedia-service)
         pool              (make-pool config)
         system-so-far     {:config            config
                            :sendgrid-service  sendgrid-service
                            :wikipedia-service wikipedia-service
                            :pool              pool}
         server            (start-server system-so-far)]
      (assoc system-so-far :server server)))

(defn stop-system 
   [system]
   (stop-server (:server system))
   (stop-connection-pool (:pool system))
   (stop-sendgrid-service (:sendgrid-service system)))

A workable metaphor for this is that each component "depends on" the components that need to start before it and that these dependencies form a graph.

Integrant, and libraries like it, provide ways to explicitly model that graph of dependencies.

This reduces boilerplate and potential error-prone-ness with the start-system and stop-system functions that logically need to exist.

In Integrant's case the dependency information is encoded into a map

{:config            {}
 :sendgrid-service  {:config (ig/ref :config)}
 :wikipedia-service {}
 :pool              {:config (ig/ref :config)}
 :server            {:config            (ig/ref :config)
                     :sendgrid-service  (ig/ref :sendgrid-service)
                     :wikipedia-service (ig/ref :wikipedia-service)}}

and the information about how each thing is started and stopped is registered with the ig/init-key and ig/halt-key multimethods.

(defmethod ig/init-key
  :pool
  [_ {:keys [config]}]
  (HikariDataSource.
    (doto (HikariConfig.)
      (.setJdbcUrl (config/lookup config :JDBC_URL)))))

(defmethod ig/halt-key!
  :pool
  [_ pool]
  (.close pool))

Starting the system now means calling ig/init-key on everything in graph traversal order and calling ig/halt-key in the reverse order.

The pieces needed for a REPL workflow can then be brought in via a library.

Partially because multimethod registration is global - and partially because its good practice regardless - the keys for different integrant components are generally made namespaced.

(ns example.system
  (:require [integrant.core :as ig]))

(def system-map
    {::config            {}
     ::sendgrid-service  {::config (ig/ref ::config)}
     ::wikipedia-service {}
     ::pool              {::config (ig/ref ::config)}
     ::server            
     {::config            (ig/ref ::config)
      ::sendgrid-service  (ig/ref ::sendgrid-service)
      ::wikipedia-service (ig/ref ::wikipedia-service)}})

So in this context, the ::pool syntax will expand to :example.system/pool.

This helps avoid conflicts with multimethod registration, but also can be used in conjunction with features like as-alias to add some semantic and syntactic distinction to pulling components out of the system.

(ns example.handlers
  ;; Without as-alias it would be really easy
  ;; to get circular dependencies doing this.
  (:require [example.system :as-alias system]))

(defn some-handler
  [{::system/keys [pool server]} request]
  ...)

Again, I find it important to note that integrant is just one of many libraries that do this "automatic wiring."

Many have sprung up over the years, and it seems like there are more yet to come. There are tradeoffs and quirks to all of them.

The important idea is just to pass things down as arguments and to start with the system maps at entry-points.

Tying it all together

To properly structure a Clojure App

  • DO NOT have stateful components be implicit
(defn do-thing
  [name]
  (slurp (str "https://website.com/get-info/" name)))
  • DO NOT have stateful components be global constants
(def pool (make-db-pool))

(defn lookup-chair 
  [chair-id]
  (jdbc/execute! 
    pool 
    ["SELECT * FROM chair
      WHERE chair.chair_id = ?"]))
  • DO have code be safe to reload in the REPL and have changes be reflected immediately
(defn root-handler
  [request]
  ...)

(defn start-server 
  []
  (jetty/run-server #'root-handler {:port 1234}))
  • DO provide REPL workflow helpers in comments
(defn root-handler
  [request]
  ...)

(defn start-server 
  []
  ...)

(comment
  (start-server))
  • DO pass stateful components explicitly as arguments.
(defn lookup-chair 
  [pool chair-id]
  (jdbc/execute! 
    pool 
    ["SELECT * FROM chair
      WHERE chair.chair_id = ?"]))
  • DO have a "system map" which can be threaded to entry-points
(defn start-system
  []
  (let [config            (load-config)
        sendgrid-service  (make-sendgrid-service config)
        wikipedia-service (make-wikipedia-service)
        pool              (make-pool config)
        system-so-far     {:config            config
                           :sendgrid-service  sendgrid-service
                           :wikipedia-service wikipedia-service
                           :pool              pool}
        server            (start-server system-so-far)]
    (assoc system-so-far :server server)))
  • DO have REPL helpers for working with the system.
(def system nil)

(defn start-system! 
  []
  (alter-var-root #'system ...))

(defn stop-system!
  []
  (alter-var-root #'system ...))

(comment
  (start-system!)

  (stop-system!))
  • DO pull out dependencies from the system using destructuring
(defn hello-handler 
  [{:keys [pool]} request]
  ...)
  • DO use namespaced keys in the "system map"
(ns example.system)

(defn start-system
  []
  (let [config            (load-config)
        sendgrid-service  (make-sendgrid-service config)
        wikipedia-service (make-wikipedia-service)
        pool              (make-pool config)
        system-so-far     {::config            config
                           ::sendgrid-service  sendgrid-service
                           ::wikipedia-service wikipedia-service
                           ::pool              pool}
        server            (start-server system-so-far)]
    (assoc system-so-far ::server server)))
(ns example.handlers
  (:require [example.system :as-alias system]))

(defn hello-handler 
  [{::system/keys [pool]} request]
  ...)
  • MAYBE use a library like integrant to reduce boilerplate when starting and stopping the system.

Sidenotes

Different Places of Injection

A technique that was harder to show with how the code examples built up but is equally valid is attaching the system as "request context."

By this I mean, have some middleware which takes the system and a handler and injects the system into the request under some key.

(defn wrap-system
   [system handler]
   (fn [request]
      (handler (assoc request :system system))))

And then have entry-points pull what they need out from that nested key.

(defn some-handler
   [request]
   (let [{:keys [pool]} (:system request)]
      ...))

Or even attach all the values into the request at the top level.

(defn wrap-system
   [system handler]
   (fn [request]
      (handler (merge request system))))
(defn some-handler
   [{:keys [pool] :as request}]
   ...)

This all has the potential benefit of avoiding the need to wire the system explicitly to code that depends on it since it is now all contained in one object

This sort of technique - by other names and syntaxes - is pretty popular in other worlds like JavaScript and Python.

Is testing really important enough to do all this?

I will claim, without a convincing argument top of mind but with strong feelings in the core of my heart, that writing code like this makes it easier to reason about and refactor.

The testing argument is just easier to make since I can more clearly show mechanical deficiencies in other approaches.

What about mount?

There is a library called mount which uses the namespace loading graph as its mechanism for knowing what order to start and stop things.

(mount/defstate other-thing
  :start (make-other-thing)
  :stop  (stop other-thing))

(mount/defstate thing
  :start (f other-thing)
  :stop  (stop thing))

This is better than using regular defs to store stateful components since you can start and stop everything from the REPL.

What this doesn't solve for is how call-sites get handles to the stateful components. Without discipline to not touch them directly, this will lead to an overall architecture indistinguishable from just using regular defs.

While wouldn't recommend it for those reasons, if you already have an app structured "the wrong way" it can be an incremental step in the right direction to get things REPL-able.

Contract Narrowing

In all examples I showed there are only like 5 actual stateful components. This is fine, but you might not feel the automatic wiring of integrant is "worth the cost" unless there is more than that.

One way that you can easily end up with more than a handful of components - even if you just have a single database - is if you practice "contract narrowing." That is - instead of passing "pool" to consumers, which will let them do any arbitrary operations on the database, pass an object with a "narrower contract" like a "user-service."

(defprotocol UserService
  (find-by-id [_ id]))

(defrecord UserServiceImpl
  [pool]
  (find-by-id [_ id]
    (jdbc/execute! 
      pool 
      ["SELECT * FROM user
        WHERE user.user_id = ?"
       id])))
(defn start-system
  []
  (let [pool         (start-pool)
        user-service (->UserServiceImpl pool)]
    {::pool         pool 
     ::user-service user-service}))
(defn user-handler
  [{::system/keys [user-service]} request]
  ... (user-service/find-by-id user-service 123) ...)

With this sort of code it is a lot more reasonable that you would end up with enough components that manually wiring their dependencies would get troublesome.

Expand the section above for further elaboration. Brag about your holiday plans in the comments below.


<- Index

A Practical Advent of Code

by: Ethan McCue

I've never done more than a few days of Advent of Code.

I'm sure its fun if you're the kind of person who likes doing those sorts of puzzles, but that's not really what I'm into. My jam is really finicky, relatively small problems. Problems that everyone can reasonably do and could come up in real code, but where it's really hard to be happy with a solution.

So that's what this is. I'm starting three days in, and I'm nowhere close to prepared to give a challenge a day, but I want to share the sorts of problems that keep me up at night.

The Challenge

The following three samples of JSON come from the activity streams specification.

If you have misguided dreams of making the next Twitter, maybe you've looked at this too.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "A note",
  "type": "Note",
  "content": "My dog has fleas."
}
{
  "@context": {
     "@vocab": "https://www.w3.org/ns/activitystreams",
     "ext": "https://canine-extension.example/terms/",
     "@language": "en"
  },
  "summary": "A note",
  "type": "Note",
  "content": "My dog has fleas.",
  "ext:nose": 0,
  "ext:smell": "terrible"
}
{
  "@context": [
     "https://www.w3.org/ns/activitystreams",
     {
      "css": "http://www.w3.org/ns/oa#styledBy"
     }
  ],
  "summary": "A note",
  "type": "Note",
  "content": "My dog has fleas.",
  "css": "http://www.csszengarden.com/217/217.css?v=8may2013"
}

In all of them, there is a piece of information we will call "the vocabulary".

If the context is a string, the vocabulary is that string. If the context is an object, the vocabulary is under the @vocab key. If the context is an array, the vocabulary is a string in the first index of the array.

So in all of these examples the vocabulary is "https://www.w3.org/ns/activitystreams"

Assume one of these shapes of JSON is in a file called activity.json. Your job is to extract the vocabulary out of that file and print it.

Restrictions

  • If you are employed, you need to use the JSON library you use for work.
  • You need to explain your code to St. Peter when you die.

I am personally most interested in solutions on the "static" side of the world - Java, C#, Rust, etc. - because this is where the solutions really go from "obvious" to "cursed" and "magic".

Leave solutions in the comments below.


My Solution

I've been doodling on a JSON library for Java that I might find time to write about later, but in that my solution is the following.

import dev.mccue.json.Json;
import dev.mccue.json.decode.alpha.Decoder;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;

public class Main {
    public static void main(String[] args) throws IOException {
        var json = Json.readString(
                Files.readString(Path.of("activity.json"))
        );
        var vocab = Decoder.field(
                json,
                "@context",
                context -> Decoder.oneOf(context,
                        Decoder::string,
                        Decoder.index(0, Decoder::string),
                        Decoder.field("@vocab", Decoder::string)
                )
        );

        System.out.println(vocab);
    }
}

Try it out for yourself if you have a mind to - I would appreciate feedback.

<dependencies>
    <dependency>
        <groupId>dev.mccue</groupId>
        <artifactId>json</artifactId>
        <version>0.0.9</version>
    </dependency>
    <dependency>
        <groupId>dev.mccue</groupId>
        <artifactId>json.decode.alpha</artifactId>
        <version>0.0.9</version>
    </dependency>
</dependencies>

<- Index

Better Java logging, inspired by Clojure and Rust (II)

by: Ethan McCue

Around three months ago I wrote a pretty long rant about a Java logging library I doodled out.

My goal was to dream up something that would fill the same ecosystem role as SLF4J but with the primary goal of supporting the logging of structured data.

I got an absolute mountain of good feedback. After spending some time reading and internalizing the responses, I realized a few things.

  1. I need to more effectively communicate my intent. Writing about the nitty-gritty details about choices in API design is my marmalade sandwich, but I need to do that after explaining what I am going to do and why.
  2. Going forward, I need to properly contextualize how everything would interact with OpenTelemetry. Yes, OTel provides an API for propagating context, but OTel would not be suitable to include in a library. There is still a role here for something which lets a library interact with that concept without including OTel outright.
  3. All the opining about ExtentLocals (now called ScopedValues) was a bit tangential, but I do stand by the choice of making the API only propagate scope inside lambdas in preparation.
  4. I need to be more mindful of allocations in general, but still think that having a reified but restricted set of log values is important and paying whatever cost there is to allocate those objects is worth it.
  5. The API I was making was doing way too much. The ring buffer, the context stack, all the thread and timing info I was attaching to Log records - all of those are still valid directions to go with a logging implementation, but not with a logging interface.

Related to the last two points - I've written up a new draft of the API without any of the context implementation and minimizing allocations down to just the log entries. This has required a few changes - farewell my sweet prince log, the method which logs a Log - but so far seems to be okay. I'm not entirely sold on how I am handling context now though.

You can find all of that on my GitHub under log.beta.

I am writing this tiny update to give myself permission to put this on the hammock for a time and move on to some other fiddly, controversial, and fun topics. Stay tuned for that.


If you want to talk about this, feel free to reach out directly by any means necessary.


<- Index

Better Java logging, inspired by Clojure and Rust

by: Ethan McCue

Existing logging libraries are based on a design from the 80s and early 90s. Most of the systems at the time where developed in standalone servers where logging messages to console or file was the predominant thing to do. Logging was mostly providing debugging information and system behavioural introspection.

Most of modern systems are distributed in virtualized machines that live in the cloud. These machines could disappear any time. In this context logging on the local file system isn't useful as logs are easily lost if virtual machines are destroyed. Therefore it is common practice to use log collectors and centralized log processors. The ELK stack it has been predominant in this space for years, but there are a multitude of other commercial and open-source products.

Most of these systems have to deal with non structured data represented as formatted strings in files. The process of extracting information out of these strings is very tedious, error prone, and definitely not fun. But the question is: why did we encode these as strings in the first place? This is just because existing log frameworks, which have been redesigned in various decades follow the same structure as when systems lived on the same single server for decades.

I believe we need the break free of these anachronistic designs and use event loggers, not message loggers, which are designed for dynamic distributed systems living in cloud and using centralized log aggregators.

- Motivation for µ/log.

I think this description is essentially correct. Existing logging libraries, especially those in the Java ecosystem, are not "fit predators" for the modern world.

I claim that we can do meaningfully better.

Whether the design I come up with at the end of this is that better solution I do not know, but if it isn't I hope it spurs someone to make whatever is.

Prior Work

µ/log

µ/log is a Clojure library written by the author of the quote above.

Clojure is a dynamically typed functional language. Not every design choice made by µ/log will make sense for the statically typed Java, but there is bound to be a lot to pull.

I'll steal it. No one will ever know!
I'll steal it. No one will ever know!

A logging call in µ/log looks something like this.

(μ/log :minesweeper.event/clicked-square :was-bomb true)

This first argument is the event type. It is stated best practice that the event type should have both a "namespace" and a "name". minesweeper.event and clicked-square respectively.

The rest of the arguments are key-value pairs of arbitrary data. The convention in Clojure would be for that arbitrary data to be made of "data primitives" instead of "named aggregates".

// In Java, data is often tied to a "named aggregate"
public final class Dog {
    private final String name;
    
    public Dog(String name) {
        this.name = name;    
    }
    
    // And how data is accessed from that aggregate
    // is totally arbitrary, though conventions like
    // getX can allow for heuristics.
    public String retrieveName() {
        return this.name;
    }
}
;; In clojure it would be uncommon to have a named "Dog" object
(deftype Dog [name])

(μ/log :cool.dog/barked :loud true :dog (new Dog "jenny"))

;; Instead, you would have a "raw map" with data about said doggo.
(μ/log :cool.dog/barked :loud true :dog {:name "rhodie"})

So while there are mechanisms to handle it, serializing user defined classes is a relatively uncommon task. Downstream consumers will generally expect values to be these "base" values - lists, maps, strings, numbers - or directly decomposable into them.

These data are then merged with a standard set of metadata about the log.

  • A timestamp saying when the "event" happened.
  • A UUID.

The timestamp is pretty self-explanatory, but the UUID is a bit custom.

It isn't a standard UUIDv4, but instead a construct called a "Flake" (as in snowflake, no two alike). A Flake has the following properties.

  • 192 bits. 128 random, 64 time based.
  • New ones compare greater than older ones.
  • Comparison order is maintained in String and byte form.
  • String representation is URL safe
  • Can be created in under 50 nanoseconds

An example value looks like this: 4lIfs0B6IRjDMHo6g2Tbgrf4lzikQNXl.

I'll freely admit I don't understand the full scope of the problems this is intended to solve, but take it on faith that it's a good design and sensible alternative to UUID.randomUUID().

In addition to that standard metadata, any global or local (bound within a lexical scope) context is included.

Global context is intended to be set at the start of the program with unchanging metadata.

(μ/set-global-context! {:app-name "cool-trail-cam"})

Internally this information is stored in an atom - a thin wrapper over an AtomicReference.

(defonce ^{:doc "The global logging context is used to add properties
             which are valid for all subsequent log events.  This is
             typically set once at the beginning of the process with
             information like the app-name, version, environment, the
             pid and other similar info."}
  global-context (atom {}))

Local context is bound using the μ/with-context macro.

(μ/with-context {:request-id "685a40fd-8740-4a2f-85ae-d6a4b2c02bb0"}
  (μ/log :cool.dog/barked :loud true :dog {:name "rhodie"}))

By virtue of being stored and retrieved from a ThreadLocal, this context will be propagated even across function calls.

(def ^{:doc "The local context is local to the current thread,
             therefore all the subsequent call to log withing the
             given context will have the properties added as well. It
             is typically used to add information regarding the
             current processing in the current thread. For example
             who is the user issuing the request and so on."}
  local-context (ut/thread-local nil))

All of this information merges together into a map that looks something like this.

{:mulog/event-name :cool.dog/barked,
 :mulog/timestamp  613022400000,
 :mulog/trace-id   #mulog/flake "4lIfs0B6IRjDMHo6g2Tbgrf4lzikQNXl"
 :mulog/namespace  "cool.dog"
 :app-name         "cool-trail-cam"
 :request-id       "685a40fd-8740-4a2f-85ae-d6a4b2c02bb0"
 :loud             true 
 :dog              {:name "rhodie"}}

Then this map is "published".

The internals of how this publishing works are better documented elsewhere, but the gist of it is that the whole map is put into a ring buffer. An asynchronous process reads from that ring buffer and dumps to ring buffers. Any number of asynchronous processes then periodically read from these buffers and do the work of to printing to stdout, sending it to Cloudwatch, etc.

µ/log calls each of these asynchronous processes which handle logs a "publisher". This can be a bit confusing in terminology, but your code "publishes" logs to a ring buffer, an unnamed process forwards these logs to more ring buffers, then "publishers" take those logs and do work with them.

ring buffer publisher scheme diagram
Diagram from `µ/log`'s docs.

The goals of this scheme are to

  1. Make it possible to "log with abandon." Publishing to a ring buffer should take only a few hundred nanoseconds.
  2. Make it so that adding publishers doesn't impact overall performance. They all get their own buffers and work at their own pace.
  3. Make observability resilient to failures. If publishers "get behind" for whatever reason - us-west-2 is down, freak gasoline fight accident - a ring buffer might lose events but in exchange there won't be an infinite backlog to work through to send off more current data.
(μ/start-publisher! {:type :console})
(μ/start-publisher! {:type :simple-file 
                     :filename "/tmp/mulog/events.log"})
(μ/start-publisher!
  {:type :elasticsearch
   :url  "http://localhost:9200/"})

The final wrinkle is that in addition to logging "events" - a record of something that happened at a given point in time - µ/log lets you record "traces" - a record of a process that occurred over a span of time.

(μ/trace :cool-dog/chasing-squirrel
  [:got-away "hope so"]
  (chase-squirrel))

Unlike a normal log this adds metadata signifying the duration of the process, whether that process terminated with an exception, and parent traces that the current trace is nested under.

{:mulog/event-name :cool-dog/chasing-squirrel,
 :mulog/timestamp  609652800000,
 :mulog/duration   777600000
 :mulog/trace-id   #mulog/flake "4lLBQ0kk-mmdCFrwI8Ravb6c8S3ddpq-"
 :mulog/root-trace #mulog/flake "4lLBQ0kbx1weOm_TT5Wynok6GLBllXQC"
 :mulog/outcome    :ok
 :mulog/namespace  "cool.dog"
 :app-name         "cool-trail-cam"
 :got-away         true}

;; If it fails, outcome will be :error and the exception included.
{:mulog/event-name :cool-dog/chasing-squirrel,
 :mulog/timestamp  609652800000,
 :mulog/duration   777600000
 :mulog/trace-id   #mulog/flake "4lLBQ0kk-mmdCFrwI8Ravb6c8S3ddpq-"
 :mulog/root-trace #mulog/flake "4lLBQ0kbx1weOm_TT5Wynok6GLBllXQC"
 :mulog/outcome    :error 
 :mulog/exception  ...
 :mulog/namespace  "cool.dog"
 :app-name         "cool-trail-cam"
 :got-away         true}

tracing

tracing is a Rust library with holistically similar goals to µ/log.

There is a similar Rust library named slog which deserves mention. It has been around longer and has a stable API, but tracing has better support for interacting with Rust's async ecosystem and a larger active community. To quote slog's docs.

Please check tracing and see if it is more suitable for your use-case. It seems that it is already a go-to logging/tracing solution for Rust.

My impression is that a lot of the broad strokes are the same, so I am going to focus mostly on tracing.

tracing is built on top of three core concepts, spans, events, and subscribers.

Spans are records of a "period of time" with a beginning and an end. This is a close parallel to what µ/log calls a "trace".

let span = span!(Level::INFO, "toasting toast");

let _guard = span.enter();

// when _guard is dropped, the span is closed.

Semantically, this makes use of a property of Rust that doesn't exist in the JVM. When a struct in Rust is no longer in lexical scope, it is immediately "dropped." Dropping can implicitly run arbitrary code such as freeing memory, closing a socket, or - in this case - doing whatever work is needed to "close" a span.

Spans also have a new kind of metadata attached to them in the "Level." The level serves two purposes

  1. It hints to external systems how "serious" something is. If your service starts pumping out 10x as many ERROR spans or events, that is probably a sign something is afoot.
  2. It lets internal systems make sensible decisions about "filtering". While running code locally TRACE spans and events might be relevant, but probably irrelevant in production.

Events are records of something that happened at a single point in time. This is closest to what we would classically call a "log."

event!(Level::INFO, "toast popped in");

Events interact with spans in that any events emitted while a span is active will be "nested" under that span.

let span = span!(Level::INFO, "toasting toast");

let _guard = span.enter();

event!(Level::INFO, "toast popped in");

// We know that the toast burned while toasting toast
// but does a toaster toast toast or does toast toast toast?
event!(Level::WARN, "toast burned");

// when _guard is dropped, the span is closed.

Spans and events are by default sent to the "global subscriber", which is meant to be set at the start of the program.

let global_subscriber = ConsoleLoggingSubscriber::new();
tracing::subscriber::set_global_default(global_subscriber)
    .expect("Failed to set");

This is stored inside a static mutable variable, which is prevented from being set twice or in a race by way of a global AtomicUsize.

static GLOBAL_INIT: AtomicUsize = AtomicUsize::new(UNINITIALIZED);

const UNINITIALIZED: usize = 0;
const INITIALIZING: usize = 1;
const INITIALIZED: usize = 2;

static mut GLOBAL_DISPATCH: Option<Dispatch> = None;

// ...

pub fn set_global_default(dispatcher: Dispatch) ->
        Result<(), SetGlobalDefaultError> {
    if GLOBAL_INIT.compare_and_swap(
        UNINITIALIZED, 
        INITIALIZING, 
        Ordering::SeqCst) == UNINITIALIZED
    {
        unsafe {
            GLOBAL_DISPATCH = Some(dispatcher);
        }
        GLOBAL_INIT.store(INITIALIZED, Ordering::SeqCst);
        EXISTS.store(true, Ordering::Release);
        Ok(())
    } else {
        Err(SetGlobalDefaultError { _no_construct: () })
    }
}

Subscribers are notified when an event happens, when a span is entered, when a span is exited, and a few other things.

pub trait Subscriber: 'static {
    fn enabled(&self, metadata: &Metadata<'_>) -> bool;
    fn new_span(&self, span: &Attributes<'_>) -> Id;
    fn record(&self, span: &Id, values: &Record<'_>);
    fn record_follows_from(&self, span: &Id, follows: &Id);
    fn event(&self, event: &Event<'_>);
    fn enter(&self, span: &Id);
    fn exit(&self, span: &Id);

    fn register_callsite(&self, metadata: &'static Metadata<'static>) 
        -> Interest { ... }
    fn max_level_hint(&self) -> Option<LevelFilter> { ... }
    fn event_enabled(&self, event: &Event<'_>) -> bool { ... }
    fn clone_span(&self, id: &Id) -> Id { ... }
    fn drop_span(&self, _id: Id) { ... }
    fn try_close(&self, id: Id) -> bool { ... }
    fn current_span(&self) -> Current { ... }
    unsafe fn downcast_raw(&self, id: TypeId) -> Option<*const ()> {
        ... 
    }
}

Okay, maybe more than a few other things.

These methods are related to filtering events. Given some metadata, a subscriber can say whether they always want to be informed of an event, never want to be informed, or will need to do some runtime check to know if an event is relevant. Because of the magic of Rust, these checks can sometimes totally remove the runtime cost of recording irrelevant events.

pub trait Subscriber: 'static {
    fn enabled(&self, metadata: &Metadata<'_>) -> bool;
    // ...
    fn register_callsite(&self, metadata: &'static Metadata<'static>) 
        -> Interest { ... }
    fn max_level_hint(&self) -> Option<LevelFilter> { ... }
    fn event_enabled(&self, event: &Event<'_>) -> bool { ... }
}

These methods allow for creating and manipulating spans. This is needed because Subscribers are responsible for tracking information about spans like their start and end times.

pub trait Subscriber: 'static {
    fn new_span(&self, span: &Attributes<'_>) -> Id;
    fn record(&self, span: &Id, values: &Record<'_>);
    fn record_follows_from(&self, span: &Id, follows: &Id);
    
    // ...
    
    fn clone_span(&self, id: &Id) -> Id { ... }
    fn drop_span(&self, _id: Id) { ... }
    fn try_close(&self, id: Id) -> bool { ... }
    fn current_span(&self) -> Current { ... }
}

Here be dragons.

pub trait Subscriber: 'static {
    // ...
    
    unsafe fn downcast_raw(&self, id: TypeId) -> Option<*const ()> {
        ... 
    }
}

And these are the more digestible ones.

pub trait Subscriber: 'static {
    // ...
    
    fn event(&self, event: &Event<'_>);
    fn enter(&self, span: &Id);
    fn exit(&self, span: &Id);
    
    // ...
}

The values supported by this system are defined implicitly by a "visitor" trait, where each method on the trait corresponds to a kind of data.

pub trait Visit {
    fn record_debug(&mut self, field: &Field, value: &dyn Debug);

    fn record_value(&mut self, field: &Field, value: Value<'_>) { ... }
    fn record_f64(&mut self, field: &Field, value: f64) { ... }
    fn record_i64(&mut self, field: &Field, value: i64) { ... }
    fn record_u64(&mut self, field: &Field, value: u64) { ... }
    fn record_i128(&mut self, field: &Field, value: i128) { ... }
    fn record_u128(&mut self, field: &Field, value: u128) { ... }
    fn record_bool(&mut self, field: &Field, value: bool) { ... }
    fn record_str(&mut self, field: &Field, value: &str) { ... }
    fn record_error(
        &mut self, 
        field: &Field, 
        value: &(dyn Error + 'static)
    ) { ... }
}

Values can therefore be a base numeric type (f64, i64, u64, i128, u128), a boolean, a string, a Rust Error, or anything that has a Debug representation. In addition, there is support for values defined via the valuable crate, which allows for more arbitrary introspection.

This can be seen as a more "typed" version of the system in µ/log. Events are still made up of "plain data" but the rolodex of what is allowed has a more explicit structure.

Similar to the current state of things in Java, there is a large portion of the ecosystem which primarily logs text based messages, often through the log crate. To deal with this, they provide a crate which massages the text based messages into tracing events called tracing_log.

For more comprehensive coverage, I recommend perusing this RustConf talk inclusive or this blog post, both by tracing's primary maintainer.

SLF4J

The most popular logging facade for Java is SLF4J. Logging libraries like Logback and Log4J provide a superset or a super-dee-duperset of its functionality while acting as the implementation for any code that calls SLF4J.

Nowadays, the mechanism for a library to "act as the implementation" is to provide an implementation of org.slf4j.spi.SLF4JServiceProvider via the service loader mechanism.

Most Java developers only need to interact with this at the level of knowing that they need to add both the slf4j-api and logback-classic as dependencies to their project. I still think it is worth noting because this ability to pick a logging implementation at runtime has been crucial to SLF4J's success.

In the most common usage pattern, a private static final field is set to the result of calling LoggerFactory.getLogger(Class<?>). This gets a logger where logged messages know about the class they are logged from.

public final class Main {
    private static final Logger log = 
            LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) {
        log.info("Hello, world");
    }
}

Log messages can be plain english.

log.warn("Somebody stole my plain bagel!");

Or they can contain placeholders for data to be formatted into.

log.info("I just reached level {} in Mouse Quest", 987413);

With special support for logging exceptions.

log.error("Hulu gives me 5 ads in a row and I pay for it!", e);

Every log is associated with a level just like tracing. This functions as metadata as usual, but can also be used to avoid making potentially expensive log calls when the associated level is not enabled.

if (log.isDebugEnabled()) {
    log.debug("expensive data here {}", expensiveProcesss());
}

Unlike tracing, data associated with the log is intended to be directly stuffed into the message. Despite the goal being ostensibly to produce english, pseudo-structured logs tend to be common.

log.info(
    "Staring background process: batchId={}, startTime={}",
    batchId,
    startTime
);

That being said, there is some support for attaching structured data via the "Mapped Diagnostic Context" (MDC) API.

try {
    MDC.put("request_id", request_id);
    
    log.info("This will have the request id available");
} finally {
    MDC.clear();
}

But that system is mechanically cumbersome to use and needs to be configured explicitly within the underlying logging implementation. People don't use it that often.

It also has no way to become compatible with the upcoming ExtentLocal API, so implementations are forever locked to using ThreadLocals.

// This try/finally structure cannot be adapted.
try {
    MDC.put("something", "abc");
} finally {
    MDC.clear();
}

// because ExtentLocals will only be able to be set in callbacks
static final ExtentLocal<Context> CONTEXT = new ExtentLocal<>();

ExtendLocal.where(CONTEXT, ...)
    .run(() -> { /* code here */ })

This is a problem, or at least potentially a problem, because ExtentLocals are designed to be considerably more efficient with a high number of threads. High numbers of threads will hopefully become the norm with Java 19+, so its worth thinking about.

And as of the very latest version there is now also an api for doing structured logging directly with KeyValuePairs and the fluent logging api.

logger.atInfo()
    .addKeyValue("cat", "fred")
    .addKeyValue("snuggles", true)
    .log("I love this cat.");

I'd say the jury is out on whether this actually will lead to many folks doing any degree of structured logging in practice, but it is certainly a welcome addition.

System.Logger

Since Java 9, there has actually been a logger bundled which functions somewhat similarly to SLF4J. The major points of distinction are that there are no explicit info, warn, error, etc. methods for logging at a particular level, there is direct support for localization via a ResourceBundle, and there is no parallel to MDC.

public interface Logger {
    String getName();

    boolean isLoggable(Level level);

    default void log(Level level, String msg) {
        // ...
    }

    default void log(Level level, Supplier<String> msgSupplier) {
        // ...
    }

    default void log(Level level, Object obj) {
        // ...
    }

    default void log(Level level, String msg, Throwable thrown) {
        // ...
    }
    
    default void log(Level level, Supplier<String> msgSupplier,
                     Throwable thrown) {
        // ...
    }

    default void log(Level level, String format, Object... params) {
        // ...
    }

    void log(Level level, ResourceBundle bundle, String msg,
             Throwable thrown);
        
    void log(Level level, ResourceBundle bundle, String format,
             Object... params);
}

OpenTelemetry

OpenTelemetry is definitely worth a mention, if not solely for the voice in the back of my head telling me that I am just making that but poorly and less well-thought-out.

OpenTelemetry defines a set of language agnostic standards for how "observability data" should be recorded within an application and how it should be communicated and propagated to the rest of the system.

I am going to make the controversial, yet brave decision to not explain much of it here. The documentation is far more thorough than I can be and enough of the concepts overlap that explaining the distinctions between what I'm writing and what the OpenTelemetry automatic and manual instrumentation libraries provide would double the size of this post.

My hope is that what comes out of this whole thought experiment could potentially serve as a frontend to its manual instrumentation component, but I'm not going to make that a focus.

It is overall pretty cool though. If I had to give anything concretely negative about it, it would be that my first experience using the automatic instrumentation library was it causing the startup time for a few services to exceed our health check grace period. That sucked.

As of right now though, luckily for me, OpenTelemetry doesn't provide a stable way to emit structured logs.

Design

Going in to this I want something which

  1. Hits the same general notes as `µ/log` and `tracing`.
  2. Is tolerable to existing Java programmers.
  3. Is suitable for inclusion in libraries just like SLF4J.

Logger

So first task on the docket is to make an interface for our logger.

/**
 * A logger.
 */
public interface Logger {
    /**
     * Logs the log.
     *
     * @param log The log to log.
     */
    void log(Log log);
}

Nailed it.

Now, there is actually some deeper reasoning here. In µ/log's design all logs are juggled between ring buffers and passed between different consumers. Doing that is a lot easier when we make a Log a concrete thing that can be passed around.

µ/log also gets away with no equivalent of tracing's span_enter and span_exit. This leads me to believe (hand wavingly) that if you aren't in the pursuit of true zero cost like rust is, it is fine to only perform actions on span_exit and have the logic usually covered by span_enter (starting the timer, assigning an id, etc.) be handled in another way.

So a single log method it is. Default methods can and will be added to make a more pleasant API, but that's the start.

Log

Now as to what constitutes a Log, at this stage I haven't laid out the full picture, so it's hard to talk about. I do think, however, that delineating Events and Spans like tracing would be a good idea.

µ/log doesn't need to care about this because it is Clojure. In a dynamic language it is an expected sort of pattern to say "if this map has :mulog/root-trace and :mulog/duration it represents a span." You can see that exact example in its zipkin publisher

The best tool I know of for directly representing this is sealed interfaces.

public sealed interface Log {
    record Event() implements Log {}
    record Span() implements Log {}
}

You can read this as "a Log is either an Event or a Span." Any information common to the two cases will be added to the interface as we move along.

Level

tracing has log levels and µ/log does not.

The way I see it, there are ways to think about levels.

  1. Log levels aren't intrinsically special. If you want a standard way to indicate "severity" you can always agree on a standard for your codebase, but it's not something to force on the user.
  2. Log levels are intrinsically special. It is a type of metadata you basically always want, and they act as a good first pass for filtering noisy events out of production.

I bounced back and forth for a bit, but Goal #2 tips the scales. Java programmers are used to levels, let them have levels.

enum Level {
    TRACE,
    DEBUG,
    INFO,
    WARN,
    ERROR
}

If there are situations that arise in practice where no log level is appropriate a dedicated "unspecified" level could make sense, but I'm putting that in my back pocket for now.

enum Level {
    UNSPECIFIED,
    TRACE,
    DEBUG,
    INFO,
    WARN,
    ERROR
}

If not for Goal #2 there is no intrinsic reason these need to be the log levels either, but c'est la vie.

enum Level {
    DEVELOPMENT_ONLY,
    FIX_DURING_WORK_HOURS,
    WAKE_ME_UP_IN_THE_MIDDLE_OF_THE_NIGHT
}

I am also choosing to nest this enum inside the Log interface.

public sealed interface Log {
    record Event() implements Log {}
    record Span() implements Log {}
    
    enum Level {
        TRACE,
        DEBUG,
        INFO,
        WARN,
        ERROR
    }
}

This is a stylistic choice more than anything, but I like being able to refer to Log.Level as such. Not only does the pattern give one place for discovery - just type Log. in your IDE to see the full set of log related types - it can save me from having to add Log as a prefix to class names.

Both Events and Spans should have log levels, so that is represented like so.

public sealed interface Log {
    Level level();
    
    record Event(
            @Override Level level
    ) implements Log {}
    
    record Span(
            @Override Level level
    ) implements Log {}
    
    enum Level {
        TRACE,
        DEBUG,
        INFO,
        WARN,
        ERROR
    }
}

Categories

I think µ/log had the right idea. Most logs should have both a "namespace" and a "name" component as their identifier. It is a pattern I think is pretty common in systems meant to take in structured data.

One example: Cloudwatch wants both a "source" and "detail-type" for their events.

{
  "version": "0",
  "id": "6a7e8feb-b491-4cf7-a9f1-bf3703467718",
  "detail-type": "EC2 Instance State-change Notification",
  "source": "aws.ec2",
  "account": "111122223333",
  "time": "2017-12-22T18:43:48Z",
  "region": "us-west-1",
  "resources": [
    "arn:aws:ec2:us-west-1:123456789012:instance/ i-1234567890abcdef0"
  ],
  "detail": {
    "instance-id": " i-1234567890abcdef0",
    "state": "terminated"
  }
}

It also gives a convenient analogue to the "class doing the logging" and "log message" that programmers are already used to. I'll call the pair of these two pieces the log's "category."

public sealed interface Log {
    // ...
    
    record Category(String namespace, String name) {

    }
}

I can't imagine Events having categories but Spans not having them so they both get them.

public sealed interface Log {
    Level level();
    
    Category category();
    
    record Event(
            @Override Level level,
            @Override Category category
    ) implements Log {}
    
    record Span(
            @Override Level level,
            @Override Category category
    ) implements Log {}
    
    // ...
    
    record Category(String namespace, String name) {

    }
}

Logger.Namespaced

At this point we can start to sketch out what logging would look like.

Logger log = ...;

log.log(
    new Log.Event(
        Log.Level.INFO, 
        new Log.Category("some.Thing", "thing-happened")
    )
);

Hideous.

So clearly some helper methods are in order.

To start, one helper method for logging an event. Spans can come later.

/**
 * A logger.
 */
public interface Logger {
    /**
     * Logs the log.
     *
     * @param log The log to log.
     */
    void log(Log log);
    
    default void event(
            Log.Level level,
            Log.Category category
    ) {
        log(new Log.Event(level, category));
    }
}

So now the verbose new Log.Event call is not needed.

Logger log = ...;

log.event(
    Log.Level.INFO, 
    new Log.Category("some.Thing", "thing-happened")
);

Next, it is common to have a dedicated method for each log level.

/**
 * A logger.
 */
public interface Logger {
    /**
     * Logs the log.
     *
     * @param log The log to log.
     */
    void log(Log log);
    
    default void event(
            Log.Level level,
            Log.Category category
    ) {
        log(new Log.Event(level, category));
    }

    default void trace(
            Log.Category category
    ) {
        event(Log.Level.TRACE, category);
    }

    default void debug(
            Log.Category category
    ) {
        event(Log.Level.DEBUG, category);
    }

    default void info(
            Log.Category category
    ) {
        event(Log.Level.INFO, category);
    }

    default void warn(
            Log.Category category
    ) {
        event(Log.Level.WARN, category);
    }

    default void error(
            Log.Category category
    ) {
        event(Log.Level.ERROR, category);
    }
}
Logger log = ...;

log.info(new Log.Category("some.Thing", "thing-happened"));

This is better still, but the new Log.Category still feels like a lot to ask of folks on every single call.

For most applications, the namespace component of logs emitted from any particular class is likely to stay constant. It is probably going to be the class name.

As such, I think that there should be two types. Logger and Logger.Namespaced. Logger will let folks log while specifying the full Log.Category and Logger.Namespaced will have the namespace part of the category pre-filled.

/**
 * A logger.
 */
public interface Logger {
    /**
     * Logs the log.
     *
     * @param log The log to log.
     */
    void log(Log log);
    
    default void event(
            Log.Level level,
            Log.Category category
    ) {
        log(new Log.Event(level, category));
    }

    // ...
    
    default Namespaced namespaced(String namespace) {
        return new NamespacedLogger(namespace, this);
    }
    
    interface Namespaced {
        void event(Log.Level level, String name);

        default void trace(
                String name
        ) {
            event(Log.Level.TRACE, name);
        }

        default void debug(
                String name
        ) {
            event(Log.Level.DEBUG, name);
        }

        default void info(
                String name
        ) {
            event(Log.Level.INFO, name);
        }

        default void warn(
                String name
        ) {
            event(Log.Level.WARN, name);
        }

        default void error(
                String name
        ) {
            event(Log.Level.ERROR, name);
        }
    }
}
// The implementation is trivial
record NamespacedLogger(String namespace, Logger logger) 
        implements Logger.Namespaced {
    public void event(Log.Level level, String name) {
        logger.event(
                level, 
                new Log.Category(this.namespace, name)
        );
    }
}

Occurrence

Before adding support for spans to the logger, we need to take a minor detour. The key difference between an Event and a Span is that an Event happens at a singular point in time while a Span takes place over a span of time.

This could be represented directly, by having Events track when they happened and having Spans track their start time and how long they lasted.

public sealed interface Log {
    Level level();
    
    Category category();
    
    record Event(
            @Override Level level,
            @Override Category category,
            Instant happenedAt
    ) implements Log {}
    
    record Span(
            @Override Level level,
            @Override Category category,
            Instant startedAt,
            Duration lasted
    ) implements Log {}
    
    // ...
}

Personally though, I find it more interesting to unify the concepts a little. Every log occurs at some time that is either a singular point of time or a span of time.

public sealed interface Log {
    // ...
    
    sealed interface Occurrence {
        record PointInTime(
                java.time.Instant happenedAt
        ) implements Occurrence {
        }

        record SpanOfTime(
                java.time.Instant startedAt, 
                java.time.Duration lasted
        ) implements Occurrence {
        }
    }
}

This lets us put Occurrence on the Log interface directly, while still having Event and Span only have the data they want.

public sealed interface Log {
    Occurrence occurrence();
    
    Level level();
    
    Category category();
    
    record Event(
            @Override Occurrence.PointInTime occurrence,
            @Override Level level,
            @Override Category category
    ) implements Log {}
    
    record Span(
            @Override Occurrence.SpanOfTime occurrence,
            @Override Level level,
            @Override Category category
    ) implements Log {}
    
    // ...

    sealed interface Occurrence {
        record PointInTime(
                java.time.Instant happenedAt
        ) implements Occurrence {
        }

        record SpanOfTime(
                java.time.Instant startedAt,
                java.time.Duration lasted
        ) implements Occurrence {
        }
    }
}

For Events, we can make it a bit easier by adding a constructor which sets the occurrence to the current point in time.

public sealed interface Log {
    Occurrence occurrence();
    
    Level level();
    
    Category category();
    
    record Event(
            @Override Occurrence.PointInTime occurrence,
            @Override Level level,
            @Override Category category
    ) implements Log {
        public Event(
                Level level,
                Category category
        ) {
            this(Instant.now(), level, category);
        }
    }
    
    record Span(
            @Override Occurrence.SpanOfTime occurrence,
            @Override Level level,
            @Override Category category
    ) implements Log {}
    
    // ...

    sealed interface Occurrence {
        record PointInTime(
                java.time.Instant happenedAt
        ) implements Occurrence {
        }

        record SpanOfTime(
                java.time.Instant startedAt,
                java.time.Duration lasted
        ) implements Occurrence {
        }
    }
}

For Spans that doesn't make quite as much sense, so auto-timing logic will probably end up being in the logger.

There are two general strategies that would work for adding Spans to the Logger and Logger.Namespaced interfaces.

The first is for a method to return something closeable. When that something is closed, the span is finished.

public interface Logger {
    void log(Log log);

    // ...

    interface SpanHandle extends AutoCloseable {
        void close();
    }

    default SpanHandle span(
            Log.Level level,
            Log.Category category
    ) {
        var start = Instant.now();
        return () -> {
            var end = Instant.now();
            log(
                    new Log.Span(
                        new Occurance.SpanOfTime(
                                start, 
                                Duration.between(start, end)
                        )
                    ),
                    level,
                    category
            );
        };
    }

    // ...
}
try (var handle = log.span(
        Log.Level.INFO, 
        new Log.Category("something", "happened"))) {
    // Span open here
    System.out.println("I am in the span!");
}

// Closed when you exit

This is convenient to use with the try-with-resources syntax and has the nice benefit of playing well with checked exceptions.

By that I mean, if the code within the span wants to throw a checked exception upwards a level or to be handled in a certain way, it is trivial to propagate that exception up.

void func() throws IOException, SQLException {
    try (var handle = log.span(
            Log.Level.INFO, 
            new Log.Category("something", "happened"))) {
        codeThatThrowsIOExceptionOrSqlException();
    } 
    // Closed when you exit
}

The other way is to directly take some code to run as a callback.

public interface Logger {
    void log(Log log);

    // ...
    
    default <T> T span(
            Log.Level level,
            Log.Category category,
            Supplier<T> code
    ) {
        var start = Instant.now();
        try {
            return code.get();
        } catch (Throwable t) {
            var end = Instant.now();
            log(
                    new Log.Span(
                            new Occurance.SpanOfTime(
                                    start,
                                    Duration.between(start, end)
                            )
                    ),
                    level,
                    category
            );
        }
    }
    
    default void span(
            Log.Level level,
            Log.Category category,
            Runnable code
    ) {
        span(level, category, () -> { 
            code.run();
            return null;
        });
    }

    // ...
}

This has some major downsides - for one, we now can't do things like return early from a function within a span.

void func() {
    int x = 8;

    log.span(
        Log.Level.INFO,
        new Log.Category("space", "hit-debris"),
        () -> {
            if (x == 8) {
                // Can only return directly from this lambda,
                // not the whole method.
                return;
            }
            System.out.println("Inside the span!");
        }
    );
    System.out.println("Will get here");
}

And we also cannot automatically handle checked exceptions in the general case.

Like, hypothetically the callback interface could look like this.

interface SpanCallback<E extends Throwable> {
    void run() throws E;
}

Which would allow us to write code which throws one kind of checked exception.

void func() throws IOException {
    // Correctly throws IOException
    log.span(
        Log.Level.INFO,
        new Log.Category("space", "hit-debris"),
        () -> {
            throw new IOException();
        }
    );
}

But if code needed to throw multiple disjoint kinds of exceptions, that would be a problem since the throws E needs to resolve to a single type.

So if the code wrapped in the span throws both IOException and SQLException, the only common type would be Exception.

// Would want this to be throws IOException, SQLException...
void func() throws Exception {
    log.span(
        Log.Level.INFO,
        new Log.Category("space", "hit-debris"),
        () -> {
            if (Math.random() > 0.5) {
                throw new IOException();
            }
            else {
                throw new SQLException();
            }
        }
    );
}

There are some solutions depending on sensibilities, but none are perfect.

Even given these issues, I think the callback system is the better of the two. ExtentLocals basically mandate it and, as a tiny spoiler, Spans are going to need to propagate context to nested logs. It also isn't possible to mess up. try-with-resources is good and IDEs will warn if you don't close a returned auto-closeable, but there are still some cursed situations that can arise if you do not.

So with that settled both Logger and Logger.Namespaced are due their full complement of span methods for each log level.

public interface Logger {
    void log(Log log);

    // ...

    default <T> T span(
            Log.Level level,
            Log.Category category,
            Supplier<T> code
    ) {
        // ...
    }
    
    default void span(
            Log.Level level,
            Log.Category category,
            Runnable code
    ) {
        // ...
    }
    
    // ...
    
    default <T> T infoSpan(
            Log.Category category,
            Supplier<T> code
    ) {
        return span(Log.Level.INFO, category, code);
    }
    
    default void infoSpan(
            Log.Category category,
            Runnable code
    ) {
        span(Log.Level.INFO, category, code);
    }

    default <T> T warnSpan(
            Log.Category category,
            Supplier<T> code
    ) {
        return span(Log.Level.WARN, category, code);
    }
    
    default void warnSpan(
            Log.Category category,
            Runnable code
    ) {
        span(Log.Level.WARN, category, code);
    }
    
    // ...
    
    interface Namepaced {
        // ...
        
        <T> T span(
            Log.Level level,
            String name,
            Supplier<T> code
        );
        
        default void span(
            Log.Level level,
            String name,
            Runnable code
        ) {
            return span(level, name, () -> {
                code.run();
                return null;
            });
        }
        
        // ...
        
        default <T> T infoSpan(
            String name,
            Supplier<T> code
        ) {
            return span(Log.Level.INFO, name, code);
        }
        
        default void infoSpan(
            String name,
            Runnable code
        ) {
            span(Log.Level.INFO, name, code);
        }
        
        default <T> T warnSpan(
            String name,
            Supplier<T> code
        ) {
            return span(Log.Level.WARN, name, code);
        }
        
        default void warnSpan(
            String name,
            Runnable code
        ) {
            span(Log.Level.WARN, name, code);
        }
        
        // ...
        
    }
}

Entry

To be useful, logs often need to carry some dynamic information. What user is making the request? What is the database record that is about to be altered?

These are the "log entries." The representation for this seems to be pretty universally key-value pairs, so that is what I am going with.

sealed interface Log {
    // ...
    
    record Entry(String key, Value value) {}
}

It follows that logs will get some list of log entries.

public sealed interface Log {
    Occurrence occurrence();
    
    Level level();
    
    Category category();
    
    List<Entry> entries();
    
    record Event(
            @Override Occurrence.PointInTime occurrence,
            @Override Level level,
            @Override Category category,
            @Override List<Entry> entries
    ) implements Log {
        public Event(
                Level level,
                Category category
        ) {
            this(
                    new Occurrence.PointInTime(Instant.now()), 
                    level,
                    category
            );
        }
    }
    
    record Span(
            @Override Occurrence.SpanOfTime occurrence,
            @Override Level level,
            @Override Category category,
            @Override List<Entry> entries
    ) implements Log {}
    
    // ...

    record Entry(String key, Value value) {}
}

µ/log merges its log entries immediately into a map, but I don't think this is necessary.

For one, we are going to be adding semantics to Log.Entry that don't exist in Map.Entry like requiring that the key and value be non-null.

public sealed interface Log {
    // ...

    record Entry(String key, Value value) {
        public Entry {
            Objects.requireNonNull(
                    key,
                    "key must not be null"
            );
            Objects.requireNonNull(
                    value,
                    "value must not be null"
            );
        }
    }
}

Also, the most common operation that is going to be performed is iterating over the full list of entries to build some JSON or similar. All the pieces of the logs that have semantic significance - like the level or when they occurred - have dedicated methods and places in the objects for them.

µ/log is also slightly special in that the keys in its maps generally wouldn't be Strings. It is idiomatic in clojure to use keywords and keywords have the distinction of having a precomputed hash code. If the internal representation were a Map<String, Value> then there would be a constant and probably pointless cost for doing that hashing.

Both Logger and Logger.Namespaced need to be updated as well.

interface Logger {
    // ...
    
    default void info(
            Log.Category category,
            List<Log.Entry> entries
    ) {
        // ...
    }

    default <T> T infoSpan(
            Log.Category category,
            List<Log.Entry> entries,
            Supplier<T> code
    ) {
        // ...
    }
    
    // ...
    
    interface Namespaced {
        // ...
        
        default void info(
                String name,
                List<Log.Entry> entries
        ) {
            // ...
        }

        default <T> T infoSpan(
                String name,
                List<Log.Entry> entries,
                Supplier<T> code
        ) {
            // ...
        }
        
        // ...
    }
}

Using List.of() to manually make lists is a bit tedious and this parameter does often come at the end of the arguments list, so a varargs overload feels a good idea.

interface Logger {
    
    // ...
    
    interface Namespaced {
        // ...
        
        default void info(
                String name,
                List<Log.Entry> entries
        ) {
            // ...
        }

        default void info(
                String name,
                Log.Entry... entries
        ) {
            info(name, List.of(entries));
        }
        
        // ...
    }
}

Syntactically we can stop there, but if you take a gander at the docs for List.of() you will see everything up to a ten argument overload. This is pretty plainly to avoid allocating the array for varargs, so it might make sense to do that for this API too.

Five log levels plus one base event method in Logger and also in Logger.Namespaced. If I gave each all ten argument overloads to match the overloads of List.of() that would be extra 120 methods plus carpal tunnel. For the hypothetical performance of something no one is using yet.

I'm not going to do that.

Which just leaves open the question, what exactly is a Value?

Value

Taking a page from tracing, I am going to restrict what is allowed as a Value.

So long as strings, numbers, booleans, lists, and maps are represented anything is probably acceptable, but I am going to be a bit liberal with what is allowed.

A good place to start might be to put my foot down and say that booleans are allowed.

sealed interface Value {
    // ...
    
    record Boolean(boolean value) implements Value {
    }

    // ...
}

Any of the primitive numeric types are okay.

sealed interface Value {
    // ...
    
    record Byte(byte value) implements Value {
    }
    
    record Character(char value) implements Value {
    }

    record Short(short value) implements Value {
    }
    
    record Integer(int value) implements Value {
    }
    
    record Long(long value) implements Value {
    }
    
    record Float(float value) implements Value {
    }
    
    record Double(double value) implements Value {
    }

    // ...
}

And Strings seem pretty cool too.

sealed interface Value {
    // ...

    record String(java.lang.String value) implements Value {
        // ...
        
        public String {
            Objects.requireNonNull(value, "value must not be null");
        }
    }

    // ...
}

UUIDs are relatively common to come across and are immutable and pretty trivially translatable to a string.

sealed interface Value {
    // ...
    
    record UUID(java.util.UUID value) implements Value {
        public UUID {
            Objects.requireNonNull(value, "value must not be null");
        }
    }

    // ...
}

URIs (not URLs!) are equally simple and come up quite a bit when making web services.

sealed interface Value {
    // ...

    record URI(java.net.URI value) implements Value {
        public URI {
            Objects.requireNonNull(value, "value must not be null");
        }
    }

    // ...
}

The types in java.time are pretty crucial.

sealed interface Value {
    // ...
    
    record Instant(java.time.Instant value) implements Value {
        public Instant {
            Objects.requireNonNull(value, "value must not be null");
        }
    }
    
    record LocalDateTime(java.time.LocalDateTime value) implements Value {
        public LocalDateTime {
            Objects.requireNonNull(value, "value must not be null");
        }
    }
    
    record LocalDate(java.time.LocalDate value) implements Value {
        public LocalDate {
            Objects.requireNonNull(value, "value must not be null");
        }
    }
    
    record LocalTime(java.time.LocalTime value) implements Value {
        public LocalTime {
            Objects.requireNonNull(value, "value must not be null");
        }
    }
    
    record Duration(java.time.Duration value) implements Value {
        public Duration {
            Objects.requireNonNull(value, "value must not be null");
        }
    }

    // ...
}

And Exceptions must be in the top 10 things to want to log.

sealed interface Value {
    // ...
    
    record Throwable(java.lang.Throwable value) implements Value {
        public Throwable {
            Objects.requireNonNull(value, "value must not be null");
        }
    }

    // ...
}

Logging Lists is one of the things that we need to support to keep parity with µ/log and tracing

sealed interface Value {
    // ...

    record List(java.util.List<Value> value) implements Value {

    }
    // ...
}

At which point, we need to talk about null.

In all the cases so far, I've added an Objects.requireNonNull to the canonical constructor for each of these cases. To me this tracks because it wouldn't make much sense to have both a "String" and an "Instant" with null values allowed.

// Should this be true or false? 
// If it should be true that will require some wacky equals method.
Objects.equals(
    new Log.Entry.Value.String(null),
    new Log.Entry.Value.Instant(null),
)

The problem is, people end up with null values pretty much constantly. Logging that you are about to perform some operation on an entity and that entity is unexpectedly null is unbelievably common.

// Sorry, we crashed because of a log!
new Log.Entry.Value.String(s);

To remedy this, I could add constructor functions to Log.Entry.Value which automatically handle null values.

sealed interface Value {
    // ...
    
    static Value.String of(java.lang.String value) {
        if (value == null) {
            return null;
        }
        else {
            return new String(value);
        }
    }

    record String(java.lang.String value) implements Value {
        // ...
        
        public String {
            Objects.requireNonNull(value, "value must not be null");
        }
    }

    // ...
}

For the primitive types it will be a little pointless, but it can help avoid errors with their wrappers.

sealed interface Value {
    // ...
    
    static Value.Boolean of(boolean value) {
        return new Boolean(value);
    }

    static Value.Boolean of(java.lang.Boolean value) {
        if (value == null) {
            return null;
        }
        else {
            return new Boolean(value);
        }
    }

    record Boolean(boolean value) implements Value {
    }

    // ...
}

But with lists, there is another foot gun we have just loaded. Lists made with List.of do not support null elements, so it is more than likely folks will end up with seemingly okay code and an inexplicable crash.

// If null, will still crash!
Log.Entry.Value.of(List.of(
    Log.Entry.Value.of(someString)
))

So to sidestep this, we need to make our own null. It feels contrived, I know.

sealed interface Value {
    enum Null implements Value {
        INSTANCE;

        @Override
        public java.lang.String toString() {
            return "Null";
        }
    }
}

Then for all the constructor functions, this becomes the fallback.

sealed interface Value {
    // ...
    
    static Value of(java.lang.String value) {
        if (value == null) {
            return Null.INSTANCE;
        }
        else {
            return new String(value);
        }
    }

    enum Null implements Value {
        INSTANCE;

        @Override
        public java.lang.String toString() {
            return "Null";
        }
    }

    record String(java.lang.String value) implements Value {
        // ...
        
        public String {
            Objects.requireNonNull(value, "value must not be null");
        }
    }

    // ...
}

Which solves our current issue fairly neatly.

String s1 = "abc";
String s2 = null;

// Our "Null" isn't "null", so all is well.
Log.Entry.Value.of(List.of(
    Log.Entry.Value.of(s1),
    Log.Entry.Value.of(s2)  
))

The remaining kinds of data to consider are maps and sets. Both would have had the same issues with null and their convenient constructor functions in Map.of() and Set.of() so it is good to have that resolved.

sealed interface Value {
    // ...
    
    record Map(java.util.Map<java.lang.String, Value> value) 
            implements Value {
        public Map(java.util.Map<java.lang.String, Value> value) {
            Objects.requireNonNull(value, "value must not be null");
            this.value = value.entrySet()
                    .stream()
                    .collect(Collectors.toUnmodifiableMap(
                            java.util.Map.Entry::getKey,
                            entry -> entry.getValue() == null
                                    ? Null.INSTANCE 
                                    : entry.getValue()
                    ));
        }
    }

    record Set(java.util.Set<Value> value) implements Value {
        public Set(java.util.Set<Value> value) {
            Objects.requireNonNull(value, "value must not be null");
            this.value = value.stream()
                    .map(v -> v == null ? Null.INSTANCE : v)
                    .collect(Collectors.toUnmodifiableSet());
        }
    }
}

For Maps there is no intrinsic reason it has to be this way, but I chose to restrict the keys to be Strings. This is both more convenient for eventual serialization to JSON and avoids issues like having two keys which would serialize to the same form if string-ified.

// This would be annoying to handle in JSON
Log.Entry.Value.of(Map.of(
    Log.Entry.Value.of(123), Log.Entry.Value.of("abc"),
    Log.Entry.Value.of("123"), Log.Entry.Value.of("def"),
))

Now for the last value kind, I promise. Occasionally producing the value for a log might be either expensive, intrinsically fallible, or both - like fetching a value from a remote server. For this, we want to provide a way to lazily compute a value.

I took the implementation of this from a combo of vavr's Lazy and Clojure's delay.

sealed interface Value {
    // ...
    
    final class Lazy implements Value {
        // Implementation based off of clojure's Delay + vavr's Lazy
        private volatile Supplier<? extends Value> supplier;
        private Value value;
        
        public Lazy(Supplier<? extends Value> supplier) {
            Objects.requireNonNull(
                    supplier, 
                    "supplier must not be null"
            );
            this.supplier = supplier;
            this.value = null;
        }
        
        public Value value() {
            if (supplier != null) {
                synchronized (this) {
                    final var s = supplier;
                    if (s != null) {
                        try {
                            this.value = Objects.requireNonNullElse(
                                    s.get(), 
                                    Null.INSTANCE
                            );
                        } catch (java.lang.Throwable throwable) {
                            this.value = new Throwable(throwable);
                        }
                        this.supplier = null;
                    }
                }
            }

            return this.value;
        }

        @Override
        public java.lang.String toString() {
            if (supplier != null) {
                return "Lazy[pending]";
            } else {
                return "Lazy[realized: value=" + value() + "]";
            }
        }
    }
}

And this gets its own ofLazy factory functions of course.

sealed interface Value {
    // ...

    static Value ofLazy(Supplier<Value> valueSupplier) {
        return new Value.Lazy(valueSupplier);
    }

    static <T> Value ofLazy(T value, Function<T, Value> toValue) {
        return new Value.Lazy(() -> {
            var v = toValue.apply(value);
            return v == null ? Value.Null.INSTANCE : v;
        });
    }
    
    // ...
}

Now to make a log it will look like this.

log.info("dog-barked", new Log.Entry(
        "name",
        Log.Entry.Value.of("gunny")
));

Which is still a bit too verbose, so for all of those Value constructor functions I will add a matching one in Log.Entry.

log.info("dog-barked", Log.Entry.of("name", "gunny"));

Which is finally terse enough that I could believe your average Jane or Joe writing it.

Flake

Now a bit of a roundup, I am giving logs the Flake from µ/log. I actually copied the class exactly as written.

public sealed interface Log {
    Flake flake();
    
    Occurrence occurrence();
    
    Level level();
    
    Category category();
    
    List<Entry> entries();
    
    record Event(
            @Override Flake flake,
            @Override Occurrence.PointInTime occurrence,
            @Override Level level,
            @Override Category category,
            @Override List<Entry> entries
    ) implements Log {
        public Event(
                Level level,
                Category category
        ) {
            this(
                    Flake.create(), 
                    new Occurrence.PointInTime(Instant.now()),
                    level, 
                    category
            );
        }
    }
    
    record Span(
            @Override Flake flake,
            @Override Occurrence.SpanOfTime occurrence,
            @Override Level level,
            @Override Category category,
            @Override List<Entry> entries
    ) implements Log {}
    
    // ...
}

Thread

Apparently, getting the current thread is basically free and since the ultimate goal is to allow sending logs around to different threads than when they originated gathering that for metadata feels appropriate.

public sealed interface Log {
    Thread thread();
    
    Flake flake();
    
    Occurrence occurrence();
    
    Level level();
    
    Category category();
    
    List<Entry> entries();
    
    record Event(
            @Override Thread thread,
            @Override Flake flake,
            @Override Occurrence.PointInTime occurrence,
            @Override Level level,
            @Override Category category,
            @Override List<Entry> entries
    ) implements Log {
        public Event(
                Level level,
                Category category
        ) {
            this(
                    Thread.currentThread(), 
                    Flake.create(), 
                    new Occurrence.PointInTime(Instant.now()), 
                    level, 
                    category
            );
        }
    }
    
    record Span(
            @Override Thread thread,
            @Override Flake flake,
            @Override Occurrence.SpanOfTime occurrence,
            @Override Level level,
            @Override Category category,
            @Override List<Entry> entries
    ) implements Log {}
    
    // ...
}

Outcome

When a span finishes, µ/log records whether that span threw an exception and if so what the exception was. This is very doable, we just need to make sure to catch and re-throw anything thrown when performing work a span.

sealed interface Log {
    record Span(
            @Override Thread thread,
            @Override Flake flake,
            Outcome outcome,
            @Override Occurrence.SpanOfTime occurrence,
            @Override Level level,
            @Override Category category,
            @Override List<Entry> entries) implements Log {
        public sealed interface Outcome {
            enum Ok implements Outcome {
                INSTANCE;

                @Override
                public String toString() {
                    return "Ok";
                }
            }

            record Error(Throwable throwable) implements Outcome {
            }
        }
    }
}

If it weren't for this, Spans and Events could probably be a single class. Unfortunately there isn't a neat value to put for an Outcome in an Event.

Context

The last thing to worry about is context. We need some mechanism for context to propagate from span to span and across method call boundaries and we need to define what exactly is allowed to be present in context.

The easiest to consider is global context. All we need is some Log.Entrys which will be included in every log.

sealed interface Log {
    sealed interface Context {
        record Global(List<Entry> entries) implements Context {
        }
    }
}

I don't see any reason to be as strict as tracing is when it comes to setting global context more than once, so using µ/log's storage strategy of an AtomicReference is likely good enough.

// Separate, internal, class to avoid exposing
class Globals {
    static final AtomicReference<Log.Context.Global> GLOBAL_CONTEXT =
            new AtomicReference<>(new Log.Context.Global(List.of()));
}

And at any point it will be "safe" - if maybe leading to strange semantics - to set this context.

sealed interface Log {
    // ...

    static void setGlobalContext(List<Entry> entries) {
        GLOBAL_CONTEXT.set(new Context.Global(entries));
    }
    
    // ...
}

There are then two other kinds of "child" context. The first is "plain" context which is intended to just carry log entries down. The second is "span" context which doesn't carry entries but instead metadata about the current span.

sealed interface Log {
    sealed interface Context {
        record Global(List<Entry> entries) implements Context {
        }
        
        sealed interface Child extends Context {
            Context parent();
            
            record Plain(
                    List<Entry> entries,
                    @Override Context parent
            ) implements Child {
            }

            record Span(
                    Thread thread,
                    Instant startedAt,
                    Flake spanId,
                    @Override Context parent
            ) implements Child {
            }
        }
    }
}

Both Plain and Span child contexts have a field linking to their parent context. This makes this a strange and wonderful kind of linked list.

To access the immediate parent or the root span, you can just crawl the linked list. This means that unlike µ/log we don't need to explicitly pass down a :mulog/root-trace or :mulog/parent-trace. We just have to accept linked lists with all their flaws.

sealed interface Log {
    sealed interface Context {
        Optional<Child.Span> parentSpan();

        default Optional<Child.Span> rootSpan() {
            return this.parentSpan()
                    .map(parent -> parent.rootSpan().orElse(parent));
        }
        
        record Global(List<Entry> entries) implements Context {
            @Override
            public Optional<Child.Span> parentSpan() {
                return Optional.empty();
            }
        }

        sealed interface Child extends Context {
            Context parent();

            @Override
            default Optional<Span> parentSpan() {
                var parent = parent();
                if (parent instanceof Span parentSpan) {
                    return Optional.of(parentSpan);
                }
                else {
                    return parent.parentSpan();
                }
            }
            
            record Plain(
                    List<Entry> entries,
                    @Override Context parent
            ) implements Child {
            }

            record Span(
                    Thread thread,
                    Instant startedAt,
                    Flake spanId,
                    @Override Context parent
            ) implements Child {
            }
        }
    }
}

Now to propagate this non-global context a ThreadLocal will be used. Yes, I've mentioned many times how I want this design to be forward compatible with ExtentLocals, but getting those early access builds is a bit hard, and I don't want to lock people who want to toy with this API today to have to figure that out.

class Globals {
    static final AtomicReference<Log.Context.Global> GLOBAL_CONTEXT =
            new AtomicReference<>(new Log.Context.Global(List.of()));

    /*
     * This should be an extent local when it is possible to be so.
     */
    static final ThreadLocal<Log.Context.Child> LOCAL_CONTEXT =
            new ThreadLocal<>();
}

Since the local context will contain a pointer to the global context that was established when it was formed, to get the current context we just need to check if there is a currently bound local context and if so use it. If not, we take from the global context.

sealed interface Log {
    sealed interface Context {
        static Context current() {
            var localContext = LOCAL_CONTEXT.get();
            return localContext == null 
                    ? GLOBAL_CONTEXT.get() 
                    : localContext;
        }
        
        // ...
    }
}

And every log will have some attached context.

public sealed interface Log {
    Context context();
    
    Thread thread();
    
    Flake flake();
    
    Occurrence occurrence();
    
    Level level();
    
    Category category();
    
    List<Entry> entries();
    
    record Event(
            @Override Context context,
            @Override Thread thread,
            @Override Flake flake,
            @Override Occurrence.PointInTime occurrence,
            @Override Level level,
            @Override Category category,
            @Override List<Entry> entries
    ) implements Log {
        public Event(
                Level level,
                Category category
        ) {
            this(
                    Context.current(),
                    Thread.currentThread(), 
                    Flake.create(), 
                    new Occurrence.PointInTime(Instant.now()), 
                    level, 
                    category
            );
        }
    }
    
    record Span(
            @Override Context context,
            @Override Thread thread,
            @Override Flake flake,
            Outcome outcome,
            @Override Occurrence.SpanOfTime occurrence,
            @Override Level level,
            @Override Category category,
            @Override List<Entry> entries
    ) implements Log {
        public Event(
                Outcome outcome,
                Occurrence.SpanOfTime occurrence,
                Level level,
                Category category,
                List<Entry> entries
        ) {
            this(
                    Context.current(),
                    Thread.currentThread(),
                    Flake.create(),
                    outcome
                    occurrence,
                    level,
                    category
            );
        }
    }
    
    // ...
}

So to add some log entries to every log made in a scope, we just need to set and unset the thread local.

sealed interface Log {
    // ...
    
    static <T> T withContext(List<Entry> entries, Supplier<T> code) {
        var localContext = LOCAL_CONTEXT.get();
        try {
            LOCAL_CONTEXT.set(new Context.Child.Plain(
                    entries,
                    localContext == null 
                            ? GLOBAL_CONTEXT.get() 
                            : localContext
            ));
            return code.get();
        } finally {
            LOCAL_CONTEXT.set(localContext);
        }
        
    }

    static void withContext(List<Entry> entries, Runnable code) {
        withContext(
                entries,
                () -> {
                    code.run();
                    return null;
                });
    }
    
    // ...
}
Log.withContext(
   List.of(Log.Entry.of("request-id", "abc")),
   () -> {
      log.info("has-request-id!");
   }   
)

And the strategy is very similar for propagating spans, with the difference that log entries are not propagated just metadata.

interface Logger {
    // ...
    
    default <T> T span(
            Log.Level level,
            Log.Category category,
            List<Log.Entry> entries,
            Supplier<T> code
    ) {
        Log.Span.Outcome outcome = Log.Span.Outcome.Ok.INSTANCE;
        var start = Instant.now();
        var localContext = LOCAL_CONTEXT.get();
        try {
            LOCAL_CONTEXT.set(new Log.Context.Child.Span(
                    Thread.currentThread(),
                    start,
                    Flake.create(),
                    localContext == null 
                            ? GLOBAL_CONTEXT.get() 
                            : localContext
            ));
            return code.get();
        } catch (Throwable t) {
            outcome = new Log.Span.Outcome.Error(t);
            throw t;
        } finally {
            LOCAL_CONTEXT.set(localContext);
            var end = Instant.now();
            var duration = Duration.between(start, end);
            var occurrence = new Log.Occurrence.SpanOfTime(
                    start, 
                    duration
            );
            log(new Log.Span(
                    outcome,
                    occurrence,
                    level,
                    category,
                    entries
            ));
        }
    }
    
    // ...
}

Now putting it all together.

Log.setGlobalContext(List.of(
    Log.Entry.of(
        "os.name",
        System.getProperty("os.name")
));

// ...

log.infoSpan(
    "handling-request",
    () -> {
        Log.withContext(
            List.of(Log.Entry.of("request-id", "abc")),
            () -> {
               log.warn(
                       "oh-no!", 
                       Log.Entry.of("failed-for-id", 123)
               );
            }
        )
    }
);

Looking past the lambdas, which are only taking so much visual budget because of the trivial-ness of the example, I am pretty happy with this API. The innermost log will have both the request-id and os-name available within its context as well as the failed-for-id directly in its entries component.

Crawling for all the log entries available in the entire linked list for a log is slightly non-trivial, so I think it would be beneficial for Log itself to implement Iterable\<Log> and do that crawling upon request.

A lot of the code so far has been ugly on the outside. I think this code is ugly on the inside too.

sealed interface Log extends Iterable<Log.Entry> {
    // ...

    @Override
    default Iterator<Entry> iterator() {
        return new Iterator<>() {
            Iterator<Entry> iter = entries().iterator();
            Context ctx = context();

            @Override
            public boolean hasNext() {
                if (iter.hasNext()) {
                    return true;
                } else {
                    if (ctx instanceof Context.Child.Plain plainCtx) {
                        iter = plainCtx.entries().iterator();
                        ctx = plainCtx.parent();
                        return this.hasNext();
                    } else if (ctx instanceof Context.Child.Span spanCtx) {
                        ctx = spanCtx.parent();
                        return this.hasNext();
                    } else if (ctx instanceof Context.Global globalCtx) {
                        iter = globalCtx.entries().iterator();
                        ctx = null;
                        return this.hasNext();
                    } else {
                        return false;
                    }
                }
            }

            @Override
            public Entry next() {
                if (iter.hasNext()) {
                    return iter.next();
                } else {
                    if (ctx instanceof Context.Child.Plain plainCtx) {
                        iter = plainCtx.entries().iterator();
                        ctx = plainCtx.parent();
                        return this.next();
                    } else if (ctx instanceof Context.Child.Span spanCtx) {
                        ctx = spanCtx.parent();
                        return this.next();
                    } else if (ctx instanceof Context.Global globalCtx) {
                        iter = globalCtx.entries().iterator();
                        ctx = null;
                        return this.next();
                    } else {
                        throw new NoSuchElementException();
                    }
                }
            }
        };
    }
}

LoggerFactory

Now to have parity with SLF4J we should be able to delegate logging to a specific implementation on the class/module-path.

To do this, we first need to make an interface for creating loggers.

interface LoggerFactory {
    Logger createLogger();
}

And then we need to declare in our module-info.java that we are interested in consuming external implementors of this interface - just in case people ever decide to put this on the module-path.

import dev.mccue.log.alpha.LoggerFactory;

module dev.mccue.log.alpha {
    exports dev.mccue.log.alpha;

    uses LoggerFactory;
}

Then we can build a factory function which scans through available implementations and picks one.

public interface LoggerFactory {
    static LoggerFactory create() {
        var loggerFactories = ServiceLoader.load(LoggerFactory.class).iterator();
        if (!loggerFactories.hasNext()) {
            System.err.println(
                    "No logger factory supplied. Falling back to no-op logger"
            );
            return () -> (__) -> {
            };
        } else {
            var service = loggerFactories.next();
            if (loggerFactories.hasNext()) {
                var services = new ArrayList<LoggerFactory>();
                services.add(service);
                while (loggerFactories.hasNext()) {
                    services.add(loggerFactories.next());
                }

                System.err.printf(
                        "Multiple logger factories supplied: %s. Picking one at random.%n",
                        services
                );
                return services.get(ThreadLocalRandom.current().nextInt(0, services.size()));
            } else {
                return service;
            }
        }
    }

    Logger createLogger();
}

The reason for the indirection - using a LoggerFactory instead of a Logger - is to allow for implementations to do some one-time set up logic and potentially set up logic per created logger.

In the context of Java, the namespace of a log would often be the name of the class that the log is in. This is why SLF4J has LoggerFactory.getLogger(Class<?>) as the obvious way to get a logger. Matching that convention is easy enough.

public interface LoggerFactory {
    static LoggerFactory create() {
        // ...
    }

    static Logger getLogger() {
        return create().createLogger();
    }

    static Logger.Namespaced getLogger(String namespace) {
        return getLogger().namespaced(namespace);
    }

    static Logger.Namespaced getLogger(Class<?> klass) {
        return getLogger().namespaced(klass.getCanonicalName());
    }

    Logger createLogger();
}

So now constructing a logger will look a lot like SLF4J

public final class Main {
    private static final Logger.Namespaced log = 
            LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) {
        log.info(
                "item-delivered", 
                Log.Entry.of("cost", "everything")
        );
    }
}

Generation

There will probably be an itch for some to generate the getLogger call like what lombok does for other loggers.

I can't add to lombok, I have a life, but I can generate the code with an annotation processor.

@DeriveLogger
public final class Main implements MainLog {
    public static void main(String[] args) {
        log.info(
                "item-delivered", 
                Log.Entry.of("cost", "everything")
        );
    }
}
// ~= to what is generated.
sealed interface MainLog permits Main {
    Logger.Namespaced log =
            LoggerFactory.getLogger(Main.class);
}

What was the point of all this?

As someone correctly pointed out to me, I never really clarified why any of this is "better".

The gist of it is that this API allows and enforces structured logging. That is, because there is a data-ified representation of events that happen within your system you can trivially transform them into structured formats such as JSON.

When your logs are all JSON you can use tools like Cloudwatch metric insights to directly search over your data.

fields @timestamp, @message
| sort @timestamp desc
| filter uri = '/something'
| limit 20

The exact methodology for this will vary from service to service, but the overarching theme is that structured data is searchable. Text data is "grep-able".

Logically when you log something in a classical framework you take a representation in memory and turn it into some "English." Then when you inspect those logs you need to undo that transformation to English to do searching. Because its 2022 there is no chance you could search those logs by hand with any reasonable scale, so you have to fall back to taking the part of your log message that is a constant and searching for that.

So with structured logging, you can skip the english and ship data directly to your observability platforms. This is just a potential API for structured logging or, more accurately, what tracing calls "in process tracing"

Let me know if this explanation doesn't track in the comments.

What Now?

Well, I am pretty confident that the API is good enough to start experimenting with. You can find the code for that here.

The annotation processor is also simple enough that I think it can be used right now. Code for that is here.

Both of those finished-ish components can be fetched from jitpack.

<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>dev.mccue</groupId>
        <artifactId>log.alpha</artifactId>
        <version>main-SNAPSHOT</version>
    </dependency>
    <dependency>
        <groupId>dev.mccue</groupId>
        <artifactId>log.alpha.generate</artifactId>
        <version>main-SNAPSHOT</version>
    </dependency>
</dependencies>

Everything else - like my rough draft JSON logger, publisher system implementation, etc - currently live in these repos.

Contributions very welcome. Feel free to reach out to me directly on discord or similar.

Next Steps

  • Publisher system implementation

A lot of design decisions I made were in support of a hypothetical system where µ/log's publisher scheme was translated. I wrote this up before finishing that prototype because I saw it would be a lot of work, and I didn't want to take the dive if there was no actual interest.

  • SLF4J Bridge

I have a sketch of what one could look like that turns SLF4J logs into records with slf4j/message, slf4j/arguments, and slf4j/mdc. This technically works in terms of information conveyance but isn't very pretty. I bet someone knows how to do better.

  • Console publisher

tracing proves that you can have structured logging and still pretty, developer friendly console logs. Everything I make is a Shrek so I probably need help to pull that off.

  • Tests

I didn't test any of this. If you can look me in the eye and say you would have written unit tests for all the log entry of functions I commend you. Some stuff might need to change to allow for unit testing applications which care about asserting that logs happen in the form they expect, but I haven't gone down that rabbit hole yet.

  • Benchmarks

I went at this maybe a bit too much by feel. I have no clue about the performance of this API. Is getting the current thread every time an issue? Should I have added predicates to check if log levels are enabled? Make those hundreds of logger overloads? No clue. I should break out JMH or do some profiling.

  • Better Docs

If I said I didn't have the opportunity to write better docs that would be a lie. I spent that time watching Letterkenny. This whole thing probably counts as an explainer, but for there to be any chance of anyone using this the rest of Diátaxis could use some attention. In my head this would take the form of fleshed out reference Javadocs, a tutorial or two, and some how-to guides on managing a migration from SLF4J.

  • Real world usage

I need brave souls - whom I would love with all my heart, mind, and body - to try this API out in some real world applications. That is probably the only way to actually validate or invalidate any choices.


If nothing else, I hope this got some of you into the same nightmare head-space I'm trapped in. Leave unconstructive criticism in the comments below.


<- Index

Turn any Java program into a self-contained EXE

by: Ethan McCue

Double-click to run is one of the easiest ways to open a program.

If the person you are sharing code with already has the right version of Java installed, they can double-click on a jar file to run it. You wrote it once, they can run it there.

If they don't have Java installed, then there are ways to create a runnable installer like jpackage, but now they have to click through an installer to be able to run your code.

You can use Native Image to turn your code into an exe which won't require them to have anything installed, but now you have to abide by the closed world assumption and that's not always easy or possible.

So this post is going to focus on a fairly oonga boonga approach that will work for any app, regardless of what dependencies you include or JVM features you make use of.

The code along with an example GitHub workflow can be found in this repo and final executables can be found here.

Prerequisites

Java 9+

java --version
jlink --version

Maven

mvn --version

NodeJS

npx --version

Step 1. Compile and Package your code into a jar.

This toy program will create a basic window that has some text that you can toggle between being capitalized.

package example;

import org.apache.commons.text.WordUtils;

import javax.swing.*;
import java.awt.*;

public class Main {
    public static void main(String[] args) {
        var label = new JLabel("Hello, World!");
        label.setFont(new Font("Serif", Font.PLAIN, 72));

        var uppercaseButton = new JButton("Uppercase");
        uppercaseButton.addActionListener(e ->
            label.setText(WordUtils.capitalize(label.getText()))
        );

        var lowercaseButton = new JButton("lowercase");
        lowercaseButton.addActionListener(e ->
            label.setText(WordUtils.uncapitalize(label.getText()))
        );

        var panel = new JPanel();
        panel.setLayout(new BoxLayout(panel, BoxLayout.Y_AXIS));
        panel.add(label);
        panel.add(uppercaseButton);
        panel.add(lowercaseButton);

        var frame = new JFrame("Basic Program");
        frame.add(panel);
        frame.pack();
        frame.setVisible(true);
        frame.setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);
    }
}

Program Demonstration

The goal is to package up your code, along with its dependencies, into a jar. Jars are just zip files with a little extra structure.

For a Maven project the configuration will look like the following.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>example</groupId>
    <artifactId>javaexe</artifactId>
    <version>1.0</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>18</maven.compiler.source>
        <maven.compiler.target>18</maven.compiler.target>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-text</artifactId>
            <version>1.9</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <manifestEntries>
                                        <Main-Class>example.Main</Main-Class>
                                        <Build-Number>1.0</Build-Number>
                                    </manifestEntries>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

Where the "shade" plugin will handle including the code from all of your dependencies into the jar. In this case, the only external dependency is org.apache.commons/commons-text.

mvn clean package

Then for the purposes of this guide we will move that jar into a new directory where it will be separate from whatever other files are in target/.

mkdir build 
mv target/javaexe-1.0.jar build

Step 2. Create a Java Runtime Environment

In order to run the jar from the previous step, we will need to bundle it with a Java Runtime Environment. To do this we will use jlink.

Since the Java ecosystem hasn't embraced modules, you most likely haven't heard of or used jlink.

The short pitch is that it can create "custom runtime images." Say you are making a web server. You don't need AWT or Swing, so including all the code for that is a tad wasteful. With jlink you can make a JRE that doesn't include the java.desktop module at all.

This system works best if your application and all of its dependencies include compiled module-info.java files which let jlink know exactly what modules you want to include. You can also manually figure out the list of required modules by using jdeps and a bit of detective work.

Even without a modular project though, we can still use jlink to effectively clone our Java installation to a directory.

jlink --add-modules ALL-MODULE-PATH --output build/runtime

Including every module gives confidence that libraries like org.apache.commons/commons-text will work as intended, even though we never figured out what modules they actually require.

Step 3. Bundle the Jar and the JRE into an executable

So with a jar containing our code and all of its dependencies in one hand and a JRE in the other, all that's left is to stitch the two together.

The general technique for that is to

  1. Zip up the directory containing the JRE and your application jar.
  2. Attach a stub script to the top of that zip file which will extract the zip to a temporary directory and run the code.

There is a JavaScript library which does this called caxa. Its purpose is making NodeJS projects into executables, so it will also bundle whatever NodeJS installation is on the system. That step can luckily be skipped by passing the --no-include-node flag, so it will work just fine for this.

npx caxa \
    --input build \
    --output application \
    --no-include-node \
    -- "{{caxa}}/runtime/bin/java" "-jar" "{{caxa}}/javaexe-1.0.jar"

This will create an executable called "application." If you are doing this for Windows you should specify "application.exe." When the executable is run the {{caxa}}s in the command will be substituted for to the temporary directory where the zip file was expanded.


I am aware of jdeploy - and it does handle stuff that I didn't cover or would be relatively hard with this scheme like code signing or automatic updates - but as far as I can tell it still requires that users run an installer.

On code signing, there is an open issue with caxa to figure out how to do that. I can make another post or update this one if an approach is figured out. I don't quite understand the issue, so I don't feel qualified to comment.

If any mildly ambitious reader wants to try their hand at making caxa in a different language so this process isn't dependent on the JS ecosystem I encourage it.

As always, comments and corrections welcome.


<- Index

The different ways to handle errors in C

by: Ethan McCue

C doesn't have a single clear way to handle errors.

The tutorials out there are pretty much garbage too.

So for this post, we are going to work with the toy example of a function that parses natural numbers from a string and go through the different approaches.

Code samples can be found in a compilable state here.

1. The Ostrich Algorithm

This might sound silly, but how often are you really going to run out of memory?

If an error condition is rare enough, you can always just dig your head in the sand and choose to ignore the possibility.

Ostrich burying its head in sand

This can make code a lot prettier, but at the cost of robustness.

#include <stdio.h>

int parse_natural_base_10_number(const char* s) {
    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        parsed *= 10;
        parsed += s[i] - '0';
    }

    return parsed;
}


int main() {
    printf("Expecting garbage or crash on bad values\n");
    const char* examples[] = { "10", "foo", "42", "" };
    for (size_t i = 0; i < 4; i++) {
        const char* example = examples[i];
        int parsed = parse_natural_base_10_number(example);
        printf("parsed: %d\n", parsed);
    }

    return 0;
}
Expecting garbage or crash on bad values
parsed: 10
parsed: 6093
parsed: 42
parsed: 0

A real world example of this can be seen with the firmware for flipper devices' use of malloc.

2. Crash.

Sometimes errors aren't practically recoverable. Most applications should probably just give up when malloc returns NULL.

If you are sure that there isn't a way to recover from an error condition and that a caller won't want to handle it in any other way, you can just print a message saying what went wrong and exit the program.

#include <stdio.h>
#include <stdlib.h>

int parse_natural_base_10_number(const char* s) {
    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            printf(
               "Got a bad character ('%c') in %s, crashing.", 
               s[i], 
               s
            );
            exit(1);
        }
        else {
            parsed *= 10;
            parsed += s[i] - '0';
        }
    }

    return parsed;
}

int main() {
    const char* examples[] = { "10", "42", "foo" };
    for (size_t i = 0; i < 3; i++) {
        const char* example = examples[i];
        int parsed = parse_natural_base_10_number(example);
        printf("parsed: %d\n", parsed);
    }

    return 0;
}
parsed: 10
parsed: 42
Got a bad character ('f') in foo, crashing.

You can see this approach in the code of OpenBLAS.

3. Return a negative number.

If the function normally would return a natural number, then you can use a negative number to indicate a failure. This is applicable both to our toy example and cases like returning the number of bytes read from a file.

If there are different kinds of errors for this sort of case you could also use specific negative numbers to indicate the different categories.

#include <stdio.h>

int parse_natural_base_10_number(const char* s) {
    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            return -1;
        }
        else {
            parsed *= 10;
            parsed += s[i] - '0';
        }
    }

    return parsed;
}

int main() {
    const char* examples[] = { "10", "foo", "42" };
    for (size_t i = 0; i < 3; i++) {
        const char* example = examples[i];
        int parsed = parse_natural_base_10_number(example);
        if (parsed < 0) {
            printf("failed: %s\n", example);
        }
        else {
            printf("worked: %d\n", parsed);
        }
    }

    return 0;
}
worked: 10
failed: foo
worked: 42

You can see examples of this in the Linux kernel.

4. Return NULL

If the function would normally return a pointer, then you can use NULL to indicate that something went wrong.

Most functions that would be returning pointers will be doing heap allocation in order for that to be sound, so this scheme is likely not applicable when you want to avoid allocations.

Also, lets be real, feels silly to heap allocate an int.

#include <stdio.h>
#include <stdlib.h>

int* parse_natural_base_10_number(const char* s) {
    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            return NULL;
        }
        else {
            parsed *= 10;
            parsed += s[i] - '0';
        }
    }

    int* result = malloc(sizeof (int));
    *result = parsed;
    return result;
}

int main() {
    const char* examples[] = { "10", "foo", "42" };
    for (size_t i = 0; i < 3; i++) {
        const char* example = examples[i];
        int* parsed = parse_natural_base_10_number(example);
        if (parsed == NULL) {
            printf("failed: %s\n", example);
        }
        else {
            printf("worked: %d\n", *parsed);
        }

        free(parsed);
    }

    return 0;
}
worked: 10
failed: foo
worked: 42

A real world example of this scheme is malloc. If malloc fails to allocate memory, then instead of returning a pointer to newly allocated memory it will return a null pointer.

5. Return a boolean and take an out param

One of the less obvious things you can do in C is to have one or more of a function's arguments "out params". This means that it is part of the contract of the function that it will write into the memory behind a pointer.

If a function can fail, a natural translation of this can be to return a boolean indicating whether it did and pass an out param that you only inspect when true is returned.

#include <stdio.h>
#include <stdbool.h>

bool parse_natural_base_10_number(const char* s, int* out) {
    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            return false;
        }
        else {
            parsed *= 10;
            parsed += s[i] - '0';
        }
    }

    *out = parsed;
    return true;
}

int main() {
    const char* examples[] = { "10", "foo", "42" };
    for (size_t i = 0; i < 3; i++) {
        const char* example = examples[i];
        int parsed;
        bool success = parse_natural_base_10_number(
            example, 
            &parsed
        );
        if (!success) {
            printf("failed: %s\n", example);
        }
        else {
            printf("worked: %d\n", parsed);
        }
    }

    return 0;
}
worked: 10
failed: foo
worked: 42

This is done pretty regularly in Windows.

6. Return an enum and take an out param

A boolean can only indicate that something succeeded or failed. If you want to know why something failed then substituting a boolean for an enum is a pretty natural mechanism.

#include <stdio.h>

enum ParseNaturalNumberResult {
    PARSE_NATURAL_SUCCESS,
    PARSE_NATURAL_EMPTY_STRING,
    PARSE_NATURAL_BAD_CHARACTER
};

enum ParseNaturalNumberResult parse_natural_base_10_number(
   const char* s, 
   int* out
) {
    if (s[0] == '\0') {
        return PARSE_NATURAL_EMPTY_STRING;
    }

    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            return PARSE_NATURAL_BAD_CHARACTER;
        }
        else {
            parsed *= 10;
            parsed += s[i] - '0';
        }
    }

    *out = parsed;
    return PARSE_NATURAL_SUCCESS;
}

int main() {
    const char* examples[] = { "10", "foo", "42", "" };
    for (size_t i = 0; i < 4; i++) {
        const char* example = examples[i];
        int parsed;
        switch (parse_natural_base_10_number(example, &parsed)) {
            case PARSE_NATURAL_SUCCESS:
                printf("worked: %d\n", parsed);
                break;
            case PARSE_NATURAL_EMPTY_STRING:
                printf("failed because empty string\n");
                break;
            case PARSE_NATURAL_BAD_CHARACTER:
                printf("failed because bad char: %s\n", example);
                break;
        }
    }

    return 0;
}
worked: 10
failed because bad char: foo
worked: 42
failed because empty string

7. Return a boolean and take two out params

While an enum can give you the "category" of an error, it doesn't have a place for recording any more specific information than that.

For example, a pretty reasonable thing to want to know if you run into an unexpected character is where in the string that character was found.

By adding a second out param you can have a place to put this information.

#include <stdio.h>
#include <stdbool.h>

bool parse_natural_base_10_number(
   const char* s, 
   int* out_value, 
   size_t* out_bad_index
) {
    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            *out_bad_index = i;
            return false;
        }
        else {
            parsed *= 10;
            parsed += s[i] - '0';
        }
    }

    *out_value = parsed;
    return true;
}

int main() {
    const char* examples[] = { "10", "foo", "42", "12a34" };
    for (size_t i = 0; i < 4; i++) {
        const char* example = examples[i];

        int parsed;
        size_t bad_index;
        bool success = parse_natural_base_10_number(
            example, 
            &parsed, 
            &bad_index
        );
        if (!success) {
            printf("failed: %s\n        ", example);
            for (size_t j = 0; j < bad_index; j++) {
                printf(" ");
            }
            printf("^☹️\n");
        }
        else {
            printf("worked: %d\n", parsed);
        }
    }

    return 0;
}
worked: 10
failed: foo
        ^☹️
worked: 42
failed: 12a34
          ^☹️

8. Return an enum and multiple out params

A natural extension of the previous two patterns is that if you have multiple ways in which a computation can fail, you can return an enum with each way and take an out param for each way that would require data.

#include <stdio.h>
#include <string.h>

enum ParseNaturalNumberResult {
    PARSE_NATURAL_SUCCESS,
    PARSE_NATURAL_EMPTY_STRING,
    PARSE_NATURAL_BAD_CHARACTER,
    PARSE_NUMBER_TOO_BIG
};

struct BadCharacterInfo {
    size_t index;
};

struct TooBigInfo {
    size_t remaining_characters;
};

enum ParseNaturalNumberResult parse_natural_base_10_number(
        const char* s,
        int* out_value,
        struct BadCharacterInfo* bad_character_info,
        struct TooBigInfo* too_big_info
) {
    if (s[0] == '\0') {
        return PARSE_NATURAL_EMPTY_STRING;
    }

    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            bad_character_info->index = i;
            return PARSE_NATURAL_BAD_CHARACTER;
        }
        else {
            int digit = s[i] - '0';
            if (__builtin_smul_overflow(parsed, 10, &parsed) ||
                __builtin_sadd_overflow(parsed, digit, &parsed)) {
                too_big_info->remaining_characters = strlen(s) - i;
                return PARSE_NUMBER_TOO_BIG;
            }
        }
    }

    *out_value = parsed;
    return PARSE_NATURAL_SUCCESS;
}

int main() {
    const char* examples[] = { "10", 
                               "foo", 
                               "42", 
                               "", 
                               "99999999999999" };
    for (size_t i = 0; i < 5; i++) {
        const char* example = examples[i];
        int parsed;
        struct BadCharacterInfo bad_character_info;
        struct TooBigInfo too_big_info;

        switch (parse_natural_base_10_number(
            example, 
            &parsed, 
            &bad_character_info,
            &too_big_info
        )) {
            case PARSE_NATURAL_SUCCESS:
                printf("worked: %d\n", parsed);
                break;
            case PARSE_NATURAL_EMPTY_STRING:
                printf("failed because empty string\n");
                break;
            case PARSE_NATURAL_BAD_CHARACTER:
                printf(
                    "failed because bad char at index %zu: %s\n",
                    bad_character_info.index,
                    example
                );
                break;
            case PARSE_NUMBER_TOO_BIG:
                printf(
                    "number was too big. had %zu digits left: %s\n",
                    too_big_info.remaining_characters,
                    example
                );
                break;
        }
    }

    return 0;
}
worked: 10
failed because bad char at index 0: foo
worked: 42
failed because empty string
number was too big. had 5 digits left: 99999999999999

9. Set a thread local static value

Another option is to, on an error, set a thread local static variable. This avoids needing to propagate an error explicitly all the way up the stack from where it occurs and makes the "normal" api of the function look as neat and clean as the ostrich or crash approaches.

Once you set the thread local static value, either you

  1. Return a predictable value indicating an issue (NULL, a negative number, etc) which hints to the programmer to check the thread local static value.
  2. Return an uninitialized value and rely on the programmer to know that the value might be bogus unless they check the thread local static value
#include <stdio.h>
#include <stdbool.h>

_Thread_local static bool parse_number_error = false;

int parse_natural_base_10_number(const char* s) {
    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            parse_number_error = true;
        }
        else {
            parsed *= 10;
            parsed += s[i] - '0';
        }
    }

    return parsed;
}

int main() {
    const char* examples[] = { "10", "42", "foo" };
    for (size_t i = 0; i < 3; i++) {
        const char* example = examples[i];
        int parsed = parse_natural_base_10_number(example);
        if (parse_number_error) {
            parse_number_error = false;
            printf("error: %s\n", example);
        }
        else {
            printf("parsed: %d\n", parsed);
        }
    }

    return 0;
}
parsed: 10
parsed: 42
error: foo

A good deal of built-in apis use a shared static constant int called errno and if they fail they will set it to a non-zero value. There are then functions like perror which can extract messages from the specific error code.

You technically are allowed to use errno too as well as long as your error conditions can fit into its int encoding.

This is my least favorite of the patterns.

10. Return a tagged union

The next approach is what languages like Rust emulate under the hood of their enums.

You make a struct containing two things

  1. A "tag". This should be a boolean or an enum depending on your tastes and the number of possibilities.
  2. A union containing enough space for the data that should be associated with each "tag".

Then you return this struct directly. The tag tells the caller which field of the union is safe to access and consequently what the "result" of the computation was.

Compared to the out param solutions, where normally you would allocate each possible out param on the stack, this will compact the required space by way of the union.

It also uses regular return values and checking the tag before checking the union is a relatively standard process.

Unfortunately it will also make code more verbose than most of the other options.

#include <stdio.h>

enum ParseNaturalNumberResultKind {
    PARSE_NATURAL_SUCCESS,
    PARSE_NATURAL_EMPTY_STRING,
    PARSE_NATURAL_BAD_CHARACTER
};

struct BadCharacter {
    size_t index;
    char c;
};

struct ParseNaturalNumberResult {
    enum ParseNaturalNumberResultKind kind;
    union {
        int success;
        struct BadCharacter bad_character;
    } data;
};

struct ParseNaturalNumberResult parse_natural_base_10_number(
   const char* s
) {
    if (s[0] == '\0') {
        struct ParseNaturalNumberResult result = {
                .kind = PARSE_NATURAL_EMPTY_STRING
        };
        return result;
    }

    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            struct ParseNaturalNumberResult result = {
                    .kind = PARSE_NATURAL_BAD_CHARACTER,
                    .data = {
                            .bad_character = {
                                    .index = i,
                                    .c = s[i]
                            }
                    }
            };
            return result;
        }
        else {
            parsed *= 10;
            parsed += s[i] - '0';
        }
    }

    struct ParseNaturalNumberResult result = {
            .kind = PARSE_NATURAL_SUCCESS,
            .data = {
                    .success = parsed
            }
    };

    return result;
}

int main() {
    const char* examples[] = { "10", "foo", "42", "12a34" };
    for (size_t i = 0; i < 4; i++) {
        const char* example = examples[i];

        struct ParseNaturalNumberResult result = 
            parse_natural_base_10_number(example);
        switch (result.kind) {
            case PARSE_NATURAL_SUCCESS:
                printf("worked: %d\n", result.data.success);
                break;
            case PARSE_NATURAL_EMPTY_STRING:
                printf("got empty string");
                break;
            case PARSE_NATURAL_BAD_CHARACTER:
                printf("failed: %s\n        ", example);
                for (size_t j = 0; 
                     j < result.data.bad_character.index; 
                     j++) {
                    printf(" ");
                }
                printf(
                    "^☹️ '%c' is not good\n", 
                    result.data.bad_character.c
                );
                break;
        }

    }

    return 0;
}
worked: 10
failed: foo
        ^☹️ 'f' is not good
worked: 42
failed: 12a34
          ^☹️ 'a' is not good

This is a very common pattern, especially when writing programs like language parsers where it is hard to avoid functions which can return one of many differently shaped possibilities. There are some examples here in the curl codebase of using the general mechanism for the result of parsing.

11. Return a boxed "error object"

The last one here is probably the toughest sell. It is more verbose than the other approaches, requires heap allocation, and requires a non-trivial degree of comfortableness in C. It does have its perks though.

First, make a "vtable". This will be a struct containing pointers to functions which take as their first argument a void pointer.

For errors, lets say the things we will want to do are produce an error message and dispose of any allocated resources afterward.

struct ErrorOps {
    char* (*describe)(const void*);
    void (*free)(void*);
};

Then make a struct which contains this vtable as well as a pointer to the memory that is meant to be passed as the first argument to each function within.

struct Error {
    struct ErrorOps ops;
    void* self;
};

You can then make some helpers for doing the calling.

char* error_describe(struct Error error) {
    return error.ops.describe(error.self);
}

void error_free(struct Error error) {
    if (error.ops.free != NULL) {
        error.ops.free(error.self);
    }
}

Then for each error condition, define how each operation should work as well as any helper functions and structs that you need.

char* empty_string_describe(const void* self) {
    char* result;
    asprintf(&result, "Empty string is not good");
    return result;
}

const struct ErrorOps empty_string_error_ops = {
        .describe = empty_string_describe,
        .free = NULL
};

struct Error empty_string_error() {
    struct Error result = {
            .ops = empty_string_error_ops,
            .self = NULL
    };
    return result;
}
struct BadCharacterError {
    char* source;
    size_t index;
};

char* bad_character_describe(const void* self) {
    const struct BadCharacterError* this = self;
    char* result;
    asprintf(
        &result, 
        "Bad character in %s at index %zu: '%c'", 
        this->source, 
        this->index, 
        this->source[this->index]
    );
    return result;
}

void bad_character_free(void* self) {
    struct BadCharacterError* this = self;
    free(this->source);
    free(this);
}

const struct ErrorOps bad_character_error_ops = {
        .describe = bad_character_describe,
        .free = bad_character_free
};

struct Error bad_character_error(const char* source, size_t index) {
    struct BadCharacterError* error = 
        malloc(sizeof (struct BadCharacterError));

    char* source_clone = calloc(strlen(source) + 1, sizeof (char));
    strcpy(source_clone, source);
    error->source = source_clone;

    error->index = index;

    struct Error result = {
            .ops = bad_character_error_ops,
            .self = error
    };
    return result;
}

Then, by any of the previous schemes, return one of these error structs if something goes wrong.

struct ParseNaturalNumberResult {
    bool success;
    union {
        int success;
        struct Error error;
    } data;
};

struct ParseNaturalNumberResult parse_natural_base_10_number(
    const char* s
) {
    if (s[0] == '\0') {
        struct ParseNaturalNumberResult result = {
                .success = false,
                .data = {
                        .error = empty_string_error()
                }
        };
        return result;
    }

    int parsed = 0;
    for (size_t i = 0; s[i] != '\0'; i++) {
        if (s[i] < '0' || s[i] > '9') {
            struct ParseNaturalNumberResult result = {
                    .success = false,
                    .data = {
                            .error = bad_character_error(s, i)
                    }
            };
            return result;
        }
        else {
            parsed *= 10;
            parsed += s[i] - '0';
        }
    }

    struct ParseNaturalNumberResult result = {
            .success = true,
            .data = {
                    .success = parsed
            }
    };

    return result;
}

int main() {
    const char* examples[] = { "10", "foo", "42", "12a34" };
    for (size_t i = 0; i < 4; i++) {
        const char* example = examples[i];

        struct ParseNaturalNumberResult result = 
            parse_natural_base_10_number(example);
        if (!result.success) {
            char* description = error_describe(result.data.error);
            printf("error: %s\n", description);
            free(description);
            error_free(result.data.error);
        }
        else {
            printf("success: %d\n", result.data.success);
        }
    }

    return 0;
}
success: 10
error: Bad character in foo at index 0: 'f'
success: 42
error: Bad character in 12a34 at index 2: 'a'

So... why do this?

Crystals!

Because it is easy to compose this kind of error.

Say we extended our problem such that we were reading a number from a file. Now the set of things that can go wrong includes all sorts of file reading related errors.

It is a lot easier to include those errors if there is a way to treat them the "same" as the ones encountered during parsing. This accomplishes that.

struct FileOperationError {
    int error_number;
};

char* file_operation_error_describe(const void* self) {
    const struct FileOperationError* this = self;
    char* result;
    asprintf(&result, "%s", strerror(this->error_number));
    return result;
}

void file_operation_error_free(void* self) {
    free(self);
}

const struct ErrorOps file_operation_error_ops = {
        .describe = file_operation_error_describe,
        .free = file_operation_error_free
};

struct Error file_operation_error(int error_number) {
    struct FileOperationError* file_operation_error = 
        malloc(sizeof (struct FileOperationError));
    file_operation_error->error_number = error_number;

    struct Error result = {
            .ops = file_operation_error_ops,
            .self = file_operation_error
    };
    return result;
}

struct ReadNumberFromFileResult {
    bool success;
    union {
        int success;
        struct Error error;
    } data;
};

struct ReadNumberFromFileResult read_number_from_file(
   const char* path
) {
    FILE* fp = fopen(path, "r");
    if (fp == NULL) {
        struct ReadNumberFromFileResult result = {
                .success = false,
                .data = {
                        .error = file_operation_error(errno)
                }
        };
        errno = 0;
        fclose(fp);
        return result;
    }

    // Max positive int is only 10 characters big in base 10
    char first_line[12];
    fgets(first_line, sizeof (first_line), fp);

    if (ferror(fp)) {
        struct ReadNumberFromFileResult result = {
                .success = false,
                .data = {
                        .error = file_operation_error(errno)
                }
        };
        errno = 0;
        fclose(fp);
        return result;
    }

    struct ParseNaturalNumberResult parse_result = 
        parse_natural_base_10_number(first_line);
    if (!parse_result.success) {
        struct ReadNumberFromFileResult result = {
                .success = false,
                .data = {
                        .error = parse_result.data.error
                }
        };
        fclose(fp);
        return result;
    }

    struct ReadNumberFromFileResult result = {
            .success = true,
            .data = {
                    .success = parse_result.data.success
            }
    };

    fclose(fp);
    return result;
}

int main() {
    const char* examples[] = { "../ex1", "../ex2", "../ex3" };
    for (size_t i = 0; i < 3; i++) {
        const char* example_file = examples[i];

        struct ReadNumberFromFileResult result = 
            read_number_from_file(example_file);
        if (!result.success) {
            char* description = error_describe(result.data.error);
            printf("error: %s\n", description);
            free(description);
            error_free(result.data.error);
        }
        else {
            printf("success: %d\n", result.data.success);
        }
    }

    return 0;
}
success: 8
error: Bad character in abc at index 0: 'a'
error: No such file or directory

This can all be done with tagged unions as well, so it is a judgement call. This sort of pattern definitely has more appeal when the language being used makes it convenient.


Important to note that I am not a professional C programmer. I fully expect to be shown the error of my ways in the comments below.


<- Index

Publish a Java library to Maven Central without Maven or Gradle

by: Ethan McCue

Say, like me, you have some code you want to share with the world.

package dev.mccue.datastructures;

/**
 * "Sum Type" representation of a linked list.
 */
public sealed interface LinkedList<T> {
    /**
     * An empty list.
     */
    record Empty<T>() 
        implements LinkedList<T> {}
    /**
     * A not empty list.
     */
    record NotEmpty<T>(T first, LinkedList<T> rest) 
        implements LinkedList<T> {}
}

To do this, you need to put that code in a place others can find it.

For Python programmers this means publishing to PyPI, Javascript programmers to npm, Rust programmers to crates.io, and C++ programmers to somewhere I assume.

For Java there are a few options, but the only one that will work by default in every build tool is Maven Central. Its apparently really good at being a repository, so publishing there is the thing to do.

There are plugins for all the major build tools that do this. However, last I tried, uploading a Java 16+ library to Maven Central using Maven was busted and requires exposing Java internals to work around.

So we are going to do something a little different. I am going to show you how to go through the entire process manually in the hope that it is straightforward enough to write your own scripts to do.

Prerequisites to follow along

Java

javac --version
jar --version
javadoc --version

gpg

gpg --version

curl

curl --version

git

git --version

Github CLI

gh --version

Step 1. Write your code

For this example I am going to put the linked list code from the top of the page in a file src/dev/mccue/datastructures/LinkedList.java and make a small .gitignore.

target/
.idea/
*.iml
.DS_Store

Step 2. Add your code to a git repo

git init
git add src/
git add .gitignore
git commit -m "Initial Commit"

Step 3. Put that git repo on the internet

You will need a public url to refer to later and services like Github are convenient for that.

gh auth login
gh repo create --public --source .
git branch -M main
git push origin main

Step 4. Get unique coordinates

Unlike other package repositories, Maven Central requires that you have a unique "group id" to prefix any packages you make. You cannot publish code under com.google, only Google can.

To meet this requirement you either need to

  • Buy a domain name. You can do this through a lot of websites. I personally use namecheap, but there are quite a few options. Once you do this you can publish code under com.yoursite.
  • Make an account on one of the git hosting services. This is the easiest way, but you will only be able to publish under io.github.yourusername or similar.

Step 5. Make an account with Sonatype

Once you got that all settled

  1. Make an account here. Save the username and password.
  2. Make a ticket here. You will need to prove that you own the website or git account that you want to use for your group id.

This is an annoying step, I know, but it is what it is. If you get caught here ask in the comments below and I'll add more clarification.

Step 6. Compile your code

javac -d target/classes -g --release 17 src/**/*.java

The -g includes debug information. Always do that.

Step 7. Generate documentation for your code

javadoc -d target/doc src/**/*.java

If you get warnings about undocumented classes and methods ignoring them is a choice you are technically allowed to make.

Step 8. Decide on a version number

When you publish code there is the implicit assumption that you might upload newer versions of that code at a later point in time. To distinguish between versions, you need to number them. There are a few schemes for doing this including Semver, Calver, and 0ver.

In the commands from this point on, I am going to assume that the initial version being published is 0.0.1, but you can do what you feel is best.

Step 9. Zip your compiled code into a jar

As early minecraft players learned when installing mods, Jar files are just zip files with a few extra bells and whistles.

mkdir target/deploy
jar --create \
  --file target/deploy/datastructures-0.0.1.jar \
  -C target/classes .

Step 10. Zip your source code into a jar

jar --create \
  --file target/deploy/datastructures-0.0.1-sources.jar \
  -C src .

Step 11. Zip your documentation into a jar

jar --create \
  --file target/deploy/datastructures-0.0.1-javadoc.jar \
  -C target/doc .

Step 12. Create a POM File

A POM - "Project Object Model" - file is the standard format for declaring information about your library including any dependencies it may have on other libraries. This format is going to be around forever and all build tools have to handle it.

The following I am going to put into target/deploy/datastructures-0.0.1.pom. This is the "minimal" POM and every field I list needs to be specified.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
                
    <groupId>dev.mccue</groupId>
    <artifactId>datastructures</artifactId>
    <version>0.0.1</version>
    <packaging>jar</packaging>
                
    <name>Datastructures</name>
    <description>Basic Datastructures for Java.</description>
    <url>https://github.com/bowbahdoe/java-datastructures</url>
                
    <licenses>
        <license>
            <name>The Apache Software License, Version 2.0</name>
            <url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
        </license>
    </licenses>
                
    <developers>
        <developer>
            <name>Ethan McCue</name>
            <email>ethan@mccue.dev</email>
            <organization>McCue Software Solutions</organization>
            <organizationUrl>https://www.mccue.dev</organizationUrl>
        </developer>
    </developers>
                
    <scm>
        <connection>scm:git:git://github.com/bowbahdoe/java-datastructures.git</connection>
        <developerConnection>scm:git:ssh://github.com:bowbahdoe/java-datastructures.git</developerConnection>
        <url>https://github.com/bowbahdoe/java-datastructures/tree/main</url>
    </scm>
</project>

Step 13. Create a GPG Key

Okay so this part might feel wierd.

The idea here was that you generate a public and private key. You sign all the files you upload with those keys and then later on someone can confirm that it was "you" that actually did that signing.

Maven Central just makes sure that everything is signed, not that there is any way to associate the signed files back to you. Because public key infrastructure never really took off, this step is largely ceremonial in practice. You still need to do it though.

The official guide is more comprehensive than I am going to be

gpg --gen-key

Make sure to save your passphrase if you made one.

Step 14. Distribute your GPG Key

Run this command

gpg --list-keys

And you should get output that kinda looks like this.

pub   rsa3072 2021-06-23 [SC] [expires: 2023-06-23]
      CA925CD6C9E8D064FF05B4728190C4130ABA0F98
uid           [ultimate] Central Repo Test <central@example.com>
sub   rsa3072 2021-06-23 [E] [expires: 2023-06-23]

You want to take the part that looks like CA925CD6C9E8D064FF05B4728190C4130ABA0F98 and run the following command.

gpg --keyserver keyserver.ubuntu.com \
  --send-keys CA925CD6C9E8D064FF05B4728190C4130ABA0F98

Step 15. Sign all the files with GPG

gpg --armor --detach-sign target/deploy/datastructures-0.0.1.jar
gpg --armor --detach-sign target/deploy/datastructures-0.0.1-sources.jar
gpg --armor --detach-sign target/deploy/datastructures-0.0.1-javadoc.jar
gpg --armor --detach-sign target/deploy/datastructures-0.0.1.pom

If you are scripting this you should add --pinentry-mode loopback and provide your passphrase via --passphrase.

Step 16. Zip all the jars into one large jar

Yes, we are making a jar jar.

Jar Jar Binks

The most convenient api for uploading code manually is a form submit on the gui that is undocumented. I wanted to use something more official, but I had trouble finding what to do. I think its probably fine.

Said api wants one large jar as its input.

jar --create --file target/bundle.jar -C target/deploy .

Step 17. Log in to sonatype

Use the username and password you got from step 5.

curl --request GET \
  --url https://s01.oss.sonatype.org/service/local/authentication/login \
  --cookie-jar cookies.txt \
  --user USERNAME:PASSWORD

Step 18. Upload the bundle to a staging repository

curl --request POST \
  --url https://s01.oss.sonatype.org/service/local/staging/bundle_upload \
  --cookie cookies.txt \
  --header 'Content-Type: multipart/form-data' \
  --form file=@target/bundle.jar

When you run this command, you will get output back that looks like this

{"repositoryUris":["https://s01.oss.sonatype.org/content/repositories/STAGING_REPOSITORY_ID"]}

At this point, you can pause and point a build tool to the staging repository to make sure that everything is okay with your code before releasing the final version.

Step 19. Release the staging repository

Fill in the STAGING_REPOSITORY_ID from the output of the last command. There is no going back once the staging repostory is released.

curl --request POST \
  --url https://s01.oss.sonatype.org/service/local/staging/bulk/promote \
  --cookie cookies.txt \
  --header 'Content-Type: application/json' \
  --data '{ 
    "data": {
        "autoDropAfterRelease": true,
        "description": "",
        "stagedRepositoryIds": ["STAGING_REPOSITORY_ID"]
    }
}'

You can try out the linked list we just published by including it in your build tool of choice.

<dependency>
    <groupId>dev.mccue</groupId>
    <artifactId>datastructures</artifactId>
    <version>0.0.1</version>
</dependency>

A fully scripted version of this process can be seen here along with an associated Github workflow

Explain what a Maven MOJO is in 140 characters or less in the comments below.


<- Index

Why is it that byteArrMap.remove is returning false

Question from theuntamed000#1481

noob question why is that byteArrMap.remove returning false

JShell Session Screenshot 1 JShell Session Screenshot 2

also does java use something like Integer.valueOf() while doing boxing

aight so first one

when you call remove with a byte array those byte arrays are not equal

jshell> byte[] b1 = new byte[0];
b1 ==> byte[0] {  }

jshell> byte[] b2 = new byte[0];
b2 ==> byte[0] {  }

jshell> b1 == b2
$3 ==> false

jshell> b1.equals(b2)
$4 ==> false

even though they have the same contents

just read the docs , it uses Object.equals which uses references

So it won't find a matching key, thus it won't remove anything returning false.

Second, new Integer(123) and Integer.valueOf(123) might seem identical, but new Integer is a constructor call and all constructor calls need to return distinct objects.

Static methods don't have that restriction, so you can implement some degree of caching behind them. Which, looking at the implementation of Integer.valueOf is what is being done. Small numbers' Integer representations are cached.

    @IntrinsicCandidate
    public static Integer valueOf(int i) {
        return i >= -128 && i <= Integer.IntegerCache.high ? Integer.IntegerCache.cache[i + 128] : new Integer(i);
    }

But the exact strategy matters less than the fact that choosing to implement a strategy is possible with the "static factory."

Even if the implementation was just

    @IntrinsicCandidate
    public static Integer valueOf(int i) {
        return new Integer(i);
    }

there would be a value in it from a library design standpoint.

And that is why this deprecated for removal makes sense. Removing the ability for libraries to call the constructor directly means the jdk would simply have more options for optimizations.

    @Deprecated(
        since = "9",
        forRemoval = true
    )
    public Integer(int value) {
        this.value = value;
    }

And when the language does autoboxing - yes it uses valueOf

$ cat Box.java
class Box {
    Integer f() {
        Integer i = 4;
        return i;
    }
}

$ javac Box.java

$ javap -v Box.class
Classfile /Users/emccue/Development/micro-http-ring/Box.class
  Last modified May 14, 2022; size 325 bytes
  SHA-256 checksum 78f01c27cb6b16a51a1c0ac47bf1ceb94cc5a4a7672afc615eb634dd948ba138
  Compiled from "Box.java"
class Box
  minor version: 0
  major version: 61
  flags: (0x0020) ACC_SUPER
  this_class: #13                         // Box
  super_class: #2                         // java/lang/Object
  interfaces: 0, fields: 0, methods: 2, attributes: 1
Constant pool:
   #1 = Methodref          #2.#3          // java/lang/Object."<init>":()V
   #2 = Class              #4             // java/lang/Object
   #3 = NameAndType        #5:#6          // "<init>":()V
   #4 = Utf8               java/lang/Object
   #5 = Utf8               <init>
   #6 = Utf8               ()V
   #7 = Methodref          #8.#9          // java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
   #8 = Class              #10            // java/lang/Integer
   #9 = NameAndType        #11:#12        // valueOf:(I)Ljava/lang/Integer;
  #10 = Utf8               java/lang/Integer
  #11 = Utf8               valueOf
  #12 = Utf8               (I)Ljava/lang/Integer;
  #13 = Class              #14            // Box
  #14 = Utf8               Box
  #15 = Utf8               Code
  #16 = Utf8               LineNumberTable
  #17 = Utf8               f
  #18 = Utf8               ()Ljava/lang/Integer;
  #19 = Utf8               SourceFile
  #20 = Utf8               Box.java
{
  Box();
    descriptor: ()V
    flags: (0x0000)
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 1: 0
  java.lang.Integer f();
    descriptor: ()Ljava/lang/Integer;
    flags: (0x0000)
    Code:
      stack=1, locals=2, args_size=1
         0: iconst_4
         1: invokestatic  #7                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
         4: astore_1
         5: aload_1
         6: areturn
      LineNumberTable:
        line 3: 0
        line 4: 5
}
SourceFile: "Box.java"

a bit verbose, but you see the invokestatic call in the last snippet refers to Integer.valueOf

yeah

$ javap -c Box.class
Compiled from "Box.java"
class Box {
  Box();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  java.lang.Integer f();
    Code:
       0: iconst_4
       1: invokestatic  #7                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
       4: astore_1
       5: aload_1
       6: areturn
}

easier to see with -c (still learning javap)

can there be case where the new Integer(i) is called

yes, if the integer is outside of the range -128 to 128 it will be outside of the cache and new Integer will be used.

for this implementation

if it can then Objects.equals() will return false , and it would behave like the byte[] example

but i guess that never happens

not quite

the implementation of equals for Integer actually compares the value

jshell> Integer i1 = Integer.valueOf(123456);
i1 ==> 123456

jshell> Integer i2 = Integer.valueOf(123456);
i2 ==> 123456

jshell> i1 == i2
$7 ==> false

jshell> i1.equals(i2)
$8 ==> true
jshell> byte[] b1 = new byte[0];
b1 ==> byte[0] {  }

jshell> byte[] b2 = new byte[0];
b2 ==> byte[0] {  }

jshell> b1 == b2
$3 ==> false

jshell> b1.equals(b2)
$4 ==> false

reference equality i1 == i2 will return false if you have distinct Integer objects

Screenshot of docs page for Map.remove

But the equals method for Integer inherited from Object is overridden so that comparing them with .equals or java.util.Objects.equals( will give the answer you would usually expect.

Yep so if you tried the byte[] map example with Integer then .remove would always find its target value and return true even if you used the deprecated new Integer directly or were outside of the cache range for Integer.valueOf.

BYTE ARRAYS
                | b1 == b2 | b1.equals(b2) | Objects.equals(b1, b2) | Arrays.equals(b1, b2)
----------------|--------------------------------------------------------------------------
| SAME OBJECT   | true     | true          | true                   | true
----------------|--------------------------------------------------------------------------
| SAME CONTENTS | false    | false         | false                  | true
----------------|--------------------------------------------------------------------------
| DIFF CONTENTS | false    | false         | false                  | false
----------------|--------------------------------------------------------------------------
| b1 is null    | false    | crash         | false                  | false
----------------|--------------------------------------------------------------------------
| b2 is null    | false    | false         | false                  | false
----------------|--------------------------------------------------------------------------
| both are null | true     | crash         | true                   | true

internal implementation does not use Objects.equals but instead does value.equals(provided)

so yeah it overrides that method

Integer
                | i1 == i2 | b1.equals(b2) | Objects.equals(b1, b2) |
----------------|----------------------------------------------------
| SAME OBJECT   | true     | true          | true                   |
----------------|----------------------------------------------------
| SAME VALUE    | false    | true          | true                   |
----------------|----------------------------------------------------
| DIFF VALUE    | false    | false         | false                  | 
----------------|----------------------------------------------------
| i1 is null    | false    | crash         | false                  |
----------------|----------------------------------------------------
| i2 is null    | false    | false         | false                  |
----------------|----------------------------------------------------
| both are null | true     | crash         | true                   |

hey thanks man, that was quite detailed


<- Index

Go's Concurrency Examples in Java 19

by: Ethan McCue

Preface

Threads are usually expensive.

There is no way for your operating system to know exactly how much stack space a thread will need so it allocates an amount on the order of around a kilobyte initially and then around a megabyte once the thread starts to be used. You only have around a bakers dozen gigabytes of RAM, so you can only have give or take 10,000 active threads.

The way around this is to implement a some mechanism that takes a limited number of operating system threads and juggles a much larger number of "logical threads" on top of them.

For most languages, this means adding some form of async/await syntax. Where you put an await the language knows it can switch to handling another task. You can only put awaits inside of code marked async. This has problems.

The Go programming language is different than most in that it implemented this juggling "non-cooperatively". You don't explicitly mark your code with async and await, the runtime slices it up for you. They call these cheap threads "goroutines."

The Java Virtual Machine is going to get an analogous feature called "Virtual Threads."

This won't just benefit Java, but every language on the JVM including Clojure, Groovy, Kotlin, and Scala.

Virtual Threads are slated to appear as a "Preview" feature in Java 19 on September 20, 2022. This means that the implementation of the underlying feature is complete and tested, but the public API is subject to breaking changes and must be opted into explicitly.

Many of Go's patterns around concurrency arise from the conceit that you can create threads with abandon.

Since Java is about to join that club, it seems a good time to go through some of the Go concurrency examples and see what they might look like translated over.

If you want to follow along, you can get an early access build here. Unzip the files and add the bin/ directory to your path.

All the examples can be followed in sequence by using jshell.

$ java --version
openjdk 19-loom 2022-09-20
OpenJDK Runtime Environment (build 19-loom+6-625)
OpenJDK 64-Bit Server VM (build 19-loom+6-625, mixed mode, sharing)

$ jshell --enable-preview --add-modules=jdk.incubator.concurrent

Example 1. Goroutines

https://go.dev/tour/concurrency/1

package main

import (
    "fmt"
    "time"
)

func say(s string) {
    for i := 0; i < 5; i++ {
        time.Sleep(100 * time.Millisecond)
        fmt.Println(s)
    }
}

func main() {
    go say("world")
    say("hello")
}

This is a pretty classic example, and frankly can be done with operating system threads just as well.

import java.time.Duration;
import java.util.concurrent.Executors;

public final class VirtualThreads {
    private VirtualThreads() {}

    static void say(String s) {
        try {
            for (int i = 0; i < 5; i++) {
                Thread.sleep(Duration.ofMillis(100));
                System.out.println(s);
            }
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        try (var executor = 
                Executors.newVirtualThreadPerTaskExecutor()) {
            executor.submit(() -> say("world"));
            say("hello");
        }
    }
}
VirtualThreads.main(new String[]{});

A few key things to notice.

  1. There is some noise in the say method around handling what will happen if the thread is interrupted.

In this case we just choose to throw a RuntimeException to indicate we just want to crash.

In Go there is less noise, but also there no way to interrupt Go's time.Sleep.

It is also an option to propagate the InterruptedException up if we add a return null; to target the Callable overload.

public final class VirtualThreads {
    private VirtualThreads() {}

    static void say(String s) throws InterruptedException {
        for (int i = 0; i < 5; i++) {
            Thread.sleep(Duration.ofMillis(100));
            System.out.println(s);
        }
    }

    public static void main(String[] args) 
            throws InterruptedException {
        try (var executor = 
                Executors.newVirtualThreadPerTaskExecutor()) {
            executor.submit(() -> {
                say("world");
                return null;
            });
            say("hello");
        }
    }
}
  1. You need more than go say("world")

Executors.newVirtualThreadPerTaskExecutor() creates an ExecutorService. This is a thing which you can submit tasks to and it will run them "somehow".

Today most ExecutorServices are backed by some pool of threads. The purpose of the interface is to be able to write code without needing to know about the underlying strategy for maintaining that pool.

Virtual Threads are cheap, so you don't need to pool them. The interface still serves a use though. ExecutorServices will extend AutoClosable, so when used with the "try-with-resources" syntax you can make a block of code where you wait until all tasks have completed before moving on.

If you wanted to do the same creating threads directly it would look like this.

public final class VirtualThreads {
    private VirtualThreads() {}

    static void say(String s) {
        try {
            for (int i = 0; i < 5; i++) {
                Thread.sleep(Duration.ofMillis(100));
                System.out.println(s);
            }
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) 
            throws InterruptedException {
        var worldThread = Thread.startVirtualThread(
            () -> say("world")
        );
        
        say("hello");
        
        // Explicitly join to wait for the other thread.
        worldThread.join();
    }
}

Example 2. Channels

https://go.dev/tour/concurrency/2

package main

import "fmt"

func sum(s []int, c chan int) {
    sum := 0
    for _, v := range s {
        sum += v
    }
    c <- sum // send sum to c
}
    
func main() {
    s := []int{7, 2, 8, -9, 4, 0}

    c := make(chan int)
    go sum(s[:len(s)/2], c)
    go sum(s[len(s)/2:], c)
    x, y := <-c, <-c // receive from c

    fmt.Println(x, y, x+y)
}

Go has the concept of a "channel." This is a lightweight pipe along which values can be sent between "Communicating Sequential Processes".

Java does not have this concept in its standard library. There are similar constructs in libraries and it may come in the future, but for now no dice.

A somewhat close analogue is a BlockingQueue, so that is what I am going to use for the purposes of these examples.

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;

public final class Queues {
    private Queues() {}

    static void sum(
        int[] s, 
        int start, 
        int end, 
        BlockingQueue<Integer> queue
    ) throws InterruptedException {
        int sum = 0;
        for (int i = start; i < end; i++) {
            sum += s[i];
        }
        queue.put(sum);
    }

    public static void main(String[] args) 
        throws InterruptedException {
        int[] s = { 7, 2, 8, -9, 4, 0 };
        try (var executor = 
                Executors.newVirtualThreadPerTaskExecutor()) {
            var queue = new ArrayBlockingQueue<Integer>(1);
            executor.submit(() -> {
                sum(s, 0, s.length / 2, queue);
                return null;
            });
            executor.submit(() -> {
                sum(s, s.length / 2, s.length, queue);
                return null;
            });

            int x = queue.take();
            int y = queue.take();

            System.out.printf("%d %d %d\n", x, y, x + y);
        }
    }
}
Queues.main(new String[]{});

Instead of Go's syntax for making slices of arrays, I opted to instead pass the indexes that each sum call was expected to work on.

It is only safe to share the memory for the array like this because no other threads are changing its contents. If there was we would have summoned Gorslax. This would be true in both Go and Java.

In both cases the way this works is each worker sends the results of its computation to a logical queue. Once we have read two values off the shared queue we implicitly know that the two tasks we started have finished.

For "one shot" use cases such as this, you could also use Java's CompletableFuture for the same purpose.

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;

public final class Queues {
    private Queues() {}

    static void sum(
        int[] s, 
        int start, 
        int end, 
        CompletableFuture<Integer> future
    ) {
        int sum = 0;
        for (int i = start; i < end; i++) {
            sum += s[i];
        }
        future.complete(sum);
    }

    public static void main(String[] args)
        throws InterruptedException, ExecutionException {
        int[] s = { 7, 2, 8, -9, 4, 0 };
        try (var executor = 
                Executors.newVirtualThreadPerTaskExecutor()) {
            var futureOne = new CompletableFuture<Integer>();
            var futureTwo = new CompletableFuture<Integer>();
            
            executor.submit(() -> {
                sum(s, 0, s.length / 2, futureOne);
                return null;
            });
            executor.submit(() -> {
                sum(s, s.length / 2, s.length, futureTwo);
                return null;
            });

            int x = futureOne.get();
            int y = futureTwo.get();

            System.out.printf("%d %d %d\n", x, y, x + y);
        }
    }
}

This adds ExecutionException to the explicit list of things that can go wrong, but is a more direct api for a task that will run and produce one value as a result.

In fact, if we were to change sum to return its result directly then we could eliminate its awareness that it is being run asynchronously.

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;

public final class Queues {
    private Queues() {}

    static int sum(int[] s, int start, int end)  {
        int sum = 0;
        for (int i = start; i < end; i++) {
            sum += s[i];
        }
        return sum;
    }

    public static void main(String[] args) 
        throws InterruptedException, ExecutionException {
        int[] s = { 7, 2, 8, -9, 4, 0 };
        try (var executor = 
                Executors.newVirtualThreadPerTaskExecutor()) {
            var futureOne = CompletableFuture
                    .supplyAsync(
                        () ->  sum(s, 0, s.length / 2),
                        executor
                    );
            var futureTwo = CompletableFuture
                    .supplyAsync(
                        () ->  sum(s, s.length / 2, s.length),
                        executor
                    );
                    
            int x = futureOne.get();
            int y = futureTwo.get();

            System.out.printf("%d %d %d\n", x, y, x + y);
        }
    }
}

And if we don't need any of the fancier capabilities of CompletableFuture, then the plain Future objects returned by submitting directly to the ExecutorService are also an option.

import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;

public final class Queues {
    private Queues() {}

    static int sum(int[] s, int start, int end)  {
        int sum = 0;
        for (int i = start; i < end; i++) {
            sum += s[i];
        }
        return sum;
    }

    public static void main(String[] args) 
        throws InterruptedException, ExecutionException {
        int[] s = { 7, 2, 8, -9, 4, 0 };
        try (var executor = 
                Executors.newVirtualThreadPerTaskExecutor()) {
            var futureOne = executor.submit(
                () ->  sum(s, 0, s.length / 2)
            );
            var futureTwo = executor.submit(
                () ->  sum(s, s.length / 2, s.length)
            );

            int x = futureOne.get();
            int y = futureTwo.get();

            System.out.printf("%d %d %d\n", x, y, x + y);
        }
    }
}

Example 3. Buffered Channels

https://go.dev/tour/concurrency/3

package main

import "fmt"

func main() {
    ch := make(chan int, 2)
    ch <- 1
    ch <- 2
    fmt.Println(<-ch)
    fmt.Println(<-ch)
}

There isn't much to this one. Go's channels can be "buffered", meaning they can accept multiple values before they will be "full". If a channel is full then any thread that wants to put a value onto that channel will have to wait until another thread takes a value off.

The ArrayBlockingQueue class we've been using works the same way.

import java.util.concurrent.ArrayBlockingQueue;

public final class BufferedQueue {
    private BufferedQueue() {}

    public static void main(String[] args) 
        throws InterruptedException {
        var queue = new ArrayBlockingQueue<Integer>(2);
        queue.put(1);
        queue.put(2);
        System.out.println(queue.take());
        System.out.println(queue.take());
    }
}
BufferedQueue.main(new String[]{});

Example 4. Range and Close

https://go.dev/tour/concurrency/4

package main

import (
    "fmt"
)

func fibonacci(n int, c chan int) {
    x, y := 0, 1
    for i := 0; i < n; i++ {
        c <- x
        x, y = y, x+y
    }
    close(c)
}

func main() {
    c := make(chan int, 10)
    go fibonacci(cap(c), c)
    for i := range c {
        fmt.Println(i)
    }
}

Here is where the differences between a Java BlockingQueue and a Go chan start to manifest themselves.

There is no ability to "close" a BlockingQueue. One way around this is to send a special "sentinel" value over the queue to indicate that a reader should stop reading. This only works cleanly when we have a single reader though.

There is also no equivalent to the range operator. We need to write a normal while loop.

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;

sealed interface TakeResult<T> {
    record GotValue<T>(T value) implements TakeResult<T> {}
    record NoValue<T>() implements TakeResult<T> {}
}

public final class Fibonacci {
    private Fibonacci() {}

    static void fibonacci(
        int n, 
        BlockingQueue<TakeResult<Integer>> queue
    ) throws InterruptedException {
        int x = 0;
        int y = 1;

        for (int i = 0; i < n; i++) {
            queue.put(new TakeResult.GotValue<>(x));
            int temp = x;
            x = y;
            y = temp + x;
        }
        queue.put(new TakeResult.NoValue<>());
    }

    public static void main(String[] args) 
        throws InterruptedException {
        try (var executor = 
                Executors.newVirtualThreadPerTaskExecutor()) {
            var queue = 
                new ArrayBlockingQueue<TakeResult<Integer>>(10);
            executor.submit(() -> {
                fibonacci(queue.remainingCapacity(), queue);
                return null;
            });

            while (queue.take() instanceof 
                    GotValue<Integer> gotValue) {
                System.out.println(gotValue.value());
            }
        }
    }
}
Fibonacci.main(new String[]{});

This snippet makes use of sealed interfaces, a relatively recent addition to Java, for modeling getting either a legitimate value over the queue or a signal to stop consuming.

The other options for the same result would be to drop the generic types from the BlockingQueue and use a special sentinel instance of Object or disallow null values for normal use and have that indicate that the queue is closed.

Example 5. Select

https://go.dev/tour/concurrency/5

package main

import "fmt"

func fibonacci(c, quit chan int) {
    x, y := 0, 1
    for {
        select {
        case c <- x:
            x, y = y, x+y
        case <-quit:
            fmt.Println("quit")
            return
        }
    }
}

func main() {
    c := make(chan int)
    quit := make(chan int)
    go func() {
        for i := 0; i < 10; i++ {
            fmt.Println(<-c)
        }
        quit <- 0
    }()
    fibonacci(c, quit)
}

There is also no equivalent to select for BlockingQueues. We have to implement that logic in a hand rolled loop.

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;

public final class SelectQueues {
    private SelectQueues() {}

    static void fibonacci(BlockingQueue<Integer> queue,
                          BlockingQueue<Integer> quit) {
        int x = 0;
        int y = 1;

        while (true) {
            if (queue.offer(x)) {
                int temp = x;
                x = y;
                y = temp + x;
            }
            if (quit.poll() != null) {
                System.out.println("quit");
                break;
            }
        }
    }

    public static void main(String[] args) {
        var queue = new ArrayBlockingQueue<Integer>(1);
        var quit = new ArrayBlockingQueue<Integer>(1);

        try (var executor = 
                Executors.newVirtualThreadPerTaskExecutor()) {
            executor.submit(() -> {
                for (int i = 0; i < 10; i++) {
                    System.out.println(queue.take());
                }
                quit.put(0);
                return null;
            });

            fibonacci(queue, quit);
        }
    }
}
SelectQueues.main(new String[]{});

I'm unsure for what purpose the Go version uses a channel of integers as its quit mechanism. In Java it is more natural to use something like a shared AtomicBoolean as a signal for shutdowwn.

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicBoolean;

public final class SelectQueues {
    private SelectQueues() {}

    static void fibonacci(BlockingQueue<Integer> queue,
                          AtomicBoolean quit) {
        int x = 0;
        int y = 1;

        while (!quit.get()) {
            if (queue.offer(x)) {
                int temp = x;
                x = y;
                y = temp + x;
            }
        }

        System.out.println("quit");
    }

    public static void main(String[] args) throws InterruptedException {
        var queue = new ArrayBlockingQueue<Integer>(1);
        var quit = new AtomicBoolean(false);

        try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
            executor.submit(() -> {
                for (int i = 0; i < 10; i++) {
                    System.out.println(queue.take());
                }
                quit.set(true);
                return null;
            });

            fibonacci(queue, quit);
        }
    }
}

If it were a situation with multiple "one shot" queues then CompletableFuture#anyOf and similar methods might suffice.

Example 6. Default Selection

https://go.dev/tour/concurrency/6

package main

import (
    "fmt"
    "time"
)

func main() {
    tick := time.Tick(100 * time.Millisecond)
    boom := time.After(500 * time.Millisecond)
    for {
        select {
        case <-tick:
            fmt.Println("tick.")
        case <-boom:
            fmt.Println("BOOM!")
            return
        default:
            fmt.Println("    .")
            time.Sleep(50 * time.Millisecond)
        }
    }
}

There isn't a novel transformation of this default case syntax, but it is worth noting how Go's time library directly returns its channels as the mechanism for handling delays.

import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.Executors;

public final class GreenThreadDress {
    private GreenThreadDress() {}

    public static void main(String[] args) 
            throws InterruptedException {
        var executor = 
                Executors.newVirtualThreadPerTaskExecutor();
        try {
            var tick = new ArrayBlockingQueue<Instant>(1);
            var boom = new ArrayBlockingQueue<Instant>(1);

            executor.submit(() -> {
                while (true) {
                    Thread.sleep(Duration.ofMillis(100));
                    tick.put(Instant.now());
                }
            });

            executor.submit(() -> {
                Thread.sleep(500);
                boom.put(Instant.now());
                return null;
            });

            while (true) {
                if (tick.poll() != null) {
                    System.out.println("tick.");
                }
                else if (boom.poll() != null) {
                    System.out.println("BOOM!");
                    break;
                }
                else {
                    System.out.println("    .");
                    Thread.sleep(Duration.ofMillis(50));
                }
            }
        } finally {
            executor.shutdownNow();
            executor.close();
        }
    }
}
GreenThreadDress.main(new String[]{});

Here we rely on the behavior of ExecutorService#shutdownNow to interrupt the task pushing to the tick queue. Unlike with the built in Go time.Tick where the underlying goroutine is never cancelled and is a "leak."

Example 7: Equivalent Binary Trees

https://go.dev/tour/concurrency/7 https://go.dev/tour/concurrency/8

package main

import "golang.org/x/tour/tree"

// Walk walks the tree t sending all values
// from the tree to the channel ch.
func Walk(t *tree.Tree, ch chan int)

// Same determines whether the trees
// t1 and t2 contain the same values.
func Same(t1, t2 *tree.Tree) bool

func main() {
}

This one is a little bit different since its not a straight example, but instead a challenge you are meant to complete.

A full solution can be found on this StackOverflow question.

package main

import "fmt"
import "golang.org/x/tour/tree"

// Walk walks the tree t sending all values
// from the tree to the channel ch.
func Walk(t *tree.Tree, ch chan int) {
    var walker func(t *tree.Tree)
    walker = func (t *tree.Tree) {
        if (t == nil) {
            return
        }
        walker(t.Left)
        ch <- t.Value
        walker(t.Right)
    }
    walker(t)
    close(ch)
}

// Same determines whether the trees
// t1 and t2 contain the same values.
func Same(t1, t2 *tree.Tree) bool {
    ch1, ch2 := make(chan int), make(chan int)

    go Walk(t1, ch1)
    go Walk(t2, ch2)

    for {
        v1,ok1 := <- ch1
        v2,ok2 := <- ch2

        if v1 != v2 || ok1 != ok2 {
            return false
        }

        if !ok1 {
            break
        }
    }

    return true
}

func main() {
    fmt.Println("1 and 1 same: ", Same(tree.New(1), tree.New(1)))
    fmt.Println("1 and 2 same: ", Same(tree.New(1), tree.New(2)))

}

Where the Tree type is defined seperately here.

// Copyright 2011 The Go Authors.  All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package tree // import "golang.org/x/tour/tree"

import (
    "fmt"
    "math/rand"
)

// A Tree is a binary tree with integer values.
type Tree struct {
    Left  *Tree
    Value int
    Right *Tree
}

// New returns a new, random binary tree holding the values k, 2k, ..., 10k.
func New(k int) *Tree {
    var t *Tree
    for _, v := range rand.Perm(10) {
        t = insert(t, (1+v)*k)
    }
    return t
}

func insert(t *Tree, v int) *Tree {
    if t == nil {
        return &Tree{nil, v, nil}
    }
    if v < t.Value {
        t.Left = insert(t.Left, v)
    } else {
        t.Right = insert(t.Right, v)
    }
    return t
}

func (t *Tree) String() string {
    if t == nil {
        return "()"
    }
    s := ""
    if t.Left != nil {
        s += t.Left.String() + " "
    }
    s += fmt.Sprint(t.Value)
    if t.Right != nil {
        s += " " + t.Right.String()
    }
    return "(" + s + ")"
}

So before touching the concurrency bits we need to translate this Tree type.

import java.util.Collections;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public sealed interface Tree {
    Tree insert(int v);

    record NotEmpty(
            Tree left,
            int value,
            Tree right
    ) implements Tree {
        @Override
        public Tree insert(int v) {
            if (v < this.value) {
                return new NotEmpty(
                        this.left.insert(v),
                        this.value,
                        this.right
                );
            }
            else {
                return new NotEmpty(
                        this.left,
                        this.value,
                        this.right.insert(v)
                );
            }
        }

        @Override
        public String toString() {
            return "( " +
                this.left +
                this.value +
                this.right +
                " )";
        }
    }

    record Empty() implements Tree {
        @Override
        public Tree insert(int v) {
            return new NotEmpty(new Empty(), v, new Empty());
        }

        @Override
        public String toString() {
            return "";
        }
    }

    static Tree random(int k) {
        var vs = IntStream.range(0, 10)
                .boxed()
                .collect(Collectors.toList());
        Collections.shuffle(vs);
        
        Tree t = new Empty();
        for (int v : vs) {
            t = t.insert((1 + v) * k);
        }
        return t;
    }
}

A 1-1 translation of the Go wouldn't be fun Java, so I opted to translate it instead to an immutable sum type. This won't affect the concurrent part other than a stronger conceptual guarentee that we can safely share the tree across multiple threads.

With this version the Go maps pretty straight forwardly to this.

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;

sealed interface TakeResult<T> {
    record GotValue<T>(T value) implements TakeResult<T> {}
    record NoValue<T>() implements TakeResult<T> {}
}

public final class TreeWalker {
    private TreeWalker() {}

    private static void walkHelper(
            Tree tree,
            BlockingQueue<TakeResult<Integer>> queue
    ) throws InterruptedException {
        if (tree == null) {
            return;
        }
        walkHelper(tree.left, queue);
        queue.put(new TakeResult.GotValue<>(
                tree.value
        ));
        walkHelper(tree.right, queue);
    }

    static void walk(
            Tree tree,
            BlockingQueue<TakeResult<Integer>> queue
    ) throws InterruptedException {
        walkHelper(tree, queue);
        queue.put(new TakeResult.NoValue<>());
    }

    static boolean same(Tree t1, Tree t2) 
            throws InterruptedException {
        var queue1 = new ArrayBlockingQueue<TakeResult<Integer>>(1);
        var queue2 = new ArrayBlockingQueue<TakeResult<Integer>>(1);

        var executor =
                Executors.newVirtualThreadPerTaskExecutor();
        try {
            executor.submit(() -> {
                walk(t1, queue1);
                return null;
            });

            executor.submit(() -> {
                walk(t2, queue2);
                return null;
            });

            while (true) {
                var result1 = queue1.take();
                var result2 = queue2.take();
                if (!result1.equals(result2)) {
                    return false;
                }

                if (result1 instanceof TakeResult.NoValue<Integer>) {
                    break;
                }
            }
            return true;
        } finally {
            executor.shutdownNow();
            executor.close();
        }
    }

    public static void main(String[] args) 
            throws InterruptedException {
        System.out.println(
            "1 and 1 same: " + 
            same(Tree.random(1), Tree.random(1))
        );
        System.out.println(
            "1 and 2 same: " + 
            same(Tree.random(1), Tree.random(2))
        );
    }
}
TreeWalker.main(new String[]{});

We use the same tricks as before to emulate a closable queue with TakeResult. Then we translate the select statement to a loop calling offer and poll.

The example Go solution had a recursive local closure for walk. While technically possible via some wizardry, its more straight forward in Java to make a helper method.

There is also a reliance on the walk tasks responding correctly to shutdownNow. If they did not, executor.close() would hang and the scope wouldn't exit.

Example 8: sync.Mutex

https://go.dev/tour/concurrency/9

package main

import (
    "fmt"
    "sync"
    "time"
)

// SafeCounter is safe to use concurrently.
type SafeCounter struct {
    mu sync.Mutex
    v  map[string]int
}

// Inc increments the counter for the given key.
func (c *SafeCounter) Inc(key string) {
    c.mu.Lock()
    // Lock so only one goroutine at a time can access the map c.v.
    c.v[key]++
    c.mu.Unlock()
}

// Value returns the current value of the counter for the given key.
func (c *SafeCounter) Value(key string) int {
    c.mu.Lock()
    // Lock so only one goroutine at a time can access the map c.v.
    defer c.mu.Unlock()
    return c.v[key]
}

func main() {
    c := SafeCounter{v: make(map[string]int)}
    for i := 0; i < 1000; i++ {
        go c.Inc("somekey")
    }

    time.Sleep(time.Second)
    fmt.Println(c.Value("somekey"))
}

Java has a direct analogue to sync.Mutex in ReentrantLock. We can make this same program without much issue.

import java.time.Duration;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.locks.ReentrantLock;

final class SafeCounter {
    private final Map<String, Integer> v;
    private final ReentrantLock lock;

    public SafeCounter() {
        this.v = new HashMap<>();
        this.lock = new ReentrantLock();
    }
    void inc(String key) {
        lock.lock();
        try {
            v.put(key, v.getOrDefault(key, 0) + 1);
        } finally {
            lock.unlock();
        }
    }

    int value(String key) {
        lock.lock();
        try {
            return v.getOrDefault(key, 0);
        } finally {
            lock.unlock();
        }
    }
}

public final class Mutex {
    private Mutex() {}

    public static void main(String[] args) 
            throws InterruptedException {
        var c = new SafeCounter();
        for (int i = 0; i < 1000; i++) {
            Thread.startVirtualThread(
                    () -> c.inc("somekey")
            );
        }

        Thread.sleep(Duration.ofSeconds(1));
        System.out.println(c.value("somekey"));
    }
}
Mutex.main(new String[]{});

The only thing of note is that unlike Go where you can defer some arbitrary action like releasing your hold on a lock, in Java the general mechanism for "cleanup that must happen" is using the finally clause of a try block.

For Java ReentrantLock is special in that its locks can "escape" a lexical scope. You can lock before entering a method and unlock in a totally unrelated one.

If you don't need this ability then you can use the fact that every "identity" having object in Java can be used as a lock with synchronized.

final class SafeCounter {
    private final Map<String, Integer> v;

    public SafeCounter() {
        this.v = new HashMap<>();
    }
    void inc(String key) {
        synchronized (this) {
            v.put(key, v.getOrDefault(key, 0) + 1);
        }
    }

    int value(String key) {
        synchronized (this) {
            return v.getOrDefault(key, 0);
        }
    }
}

If the thing being synchronized on is just this and that synchronization lasts the entire scope of the method, we can just mark the method as synchronized for the same effect.

final class SafeCounter {
    private final Map<String, Integer> v;

    public SafeCounter() {
        this.v = new HashMap<>();
    }

    synchronized void inc(String key) {
        v.put(key, v.getOrDefault(key, 0) + 1);
    }

    synchronized int value(String key) {
        return v.getOrDefault(key, 0);
    }
}

Example 9: Web Crawler

https://go.dev/tour/concurrency/10

package main

import (
    "fmt"
)

type Fetcher interface {
    // Fetch returns the body of URL and
    // a slice of URLs found on that page.
    Fetch(url string) (body string, urls []string, err error)
}

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
    // TODO: Fetch URLs in parallel.
    // TODO: Don't fetch the same URL twice.
    // This implementation doesn't do either:
    if depth <= 0 {
        return
    }
    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("found: %s %q\n", url, body)
    for _, u := range urls {
        Crawl(u, depth-1, fetcher)
    }
    return
}

func main() {
    Crawl("https://golang.org/", 4, fetcher)
}

// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult

type fakeResult struct {
    body string
    urls []string
}

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
    if res, ok := f[url]; ok {
        return res.body, res.urls, nil
    }
    return "", nil, fmt.Errorf("not found: %s", url)
}

// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
    "https://golang.org/": &fakeResult{
        "The Go Programming Language",
        []string{
            "https://golang.org/pkg/",
            "https://golang.org/cmd/",
        },
    },
        "https://golang.org/pkg/": &fakeResult{
            "Packages",
            []string{
                "https://golang.org/",
                "https://golang.org/cmd/",
                "https://golang.org/pkg/fmt/",
                "https://golang.org/pkg/os/",
            },
        },
        "https://golang.org/pkg/fmt/": &fakeResult{
            "Package fmt",
            []string{
                "https://golang.org/",
                "https://golang.org/pkg/",
            },
        },
        "https://golang.org/pkg/os/": &fakeResult{
            "Package os",
            []string{
                "https://golang.org/",
                "https://golang.org/pkg/",
            },
        },
}

Another challenge problem. Before going to a reference solution, I am going to translate the synchronous example.

Go has a pattern of returning an error as a conditionally filled in extra return value. This isn't super idiomatic in Java, so instead I am going to model the error case as an Exception.

The Go version also returns both a body and an array of urls from a single call. For Java we can accomplish this by making an aggregate containing both values.

final class FetcherException extends Exception {
    FetcherException(String message) {
        super(message);
    }
}

interface Fetcher {
    record Result(String body, String[] urls) {
    }

    Result fetch(String url) throws FetcherException;
}

We can make a "fake" implementation of fetcher using the same technique as Go by backing it with an in-memory map.

import java.util.Map;

final class FakeFetcher implements Fetcher {
    private final Map<String, Result> results;

    public FakeFetcher(Map<String, Result> results) {
        this.results = results;
    }

    @Override
    public Result fetch(String url) throws FetcherException {
        var result = this.results.get(url);
        if (result == null) {
            throw new FetcherException("Not Found: " + url);
        } else {
            return result;
        }
    }

    public static Fetcher example() {
        return new FakeFetcher(Map.of(
                "https://golang.org/", new Fetcher.Result(
                        "The Go Programming Language",
                        new String[]{
                                "https://golang.org/pkg/",
                                "https://golang.org/cmd/"
                        }
                ),
                "https://golang.org/pkg/", new Fetcher.Result(
                        "Packages",
                        new String[]{
                                "https://golang.org/",
                                "https://golang.org/cmd/",
                                "https://golang.org/pkg/fmt/",
                                "https://golang.org/pkg/os/",
                        }
                ),
                "https://golang.org/pkg/fmt/", new Fetcher.Result(
                        "Package fmt",
                        new String[]{
                                "https://golang.org/",
                                "https://golang.org/pkg/",
                        }
                ),
                "https://golang.org/pkg/os/", new Fetcher.Result(
                        "Package os",
                        new String[]{
                                "https://golang.org/",
                                "https://golang.org/pkg/",
                        }
                )
        ));
    }
}

Then the synchronous fetcher is just a regular recursive function as it was in Go.

public final class WebCrawler {
    private WebCrawler() {
    }

    static void crawl(
        String url, 
        int depth, 
        Fetcher fetcher
    ) {
        if (depth <= 0) {
            return;
        }

        Fetcher.Result result;
        try {
            result = fetcher.fetch(url);
        } catch (FetcherException e) {
            System.out.println(e.getMessage());
            return;
        }

        var body = result.body();
        var urls = result.urls();

        System.out.printf(
            "Found: %s %s\n", 
            body, 
            Arrays.toString(urls)
        );

        for (var u : urls) {
            crawl(u, depth - 1, fetcher);
        }
    }

    public static void main(String[] args) {
        var fetcher = FakeFetcher.example();

        crawl("https://golang.org/", 4, fetcher);
    }
}
WebCrawler.main(new String[]{});

Now I am going to pull an answer to the exercise from this StackOverflow post.

// SafeUrlMap is safe to use concurrently.
type SafeUrlMap struct {
    v   map[string]string
    mux sync.Mutex
}

func (c *SafeUrlMap) Set(key string, body string) {
    c.mux.Lock()
    // Lock so only one goroutine at a time can access the map c.v.
    c.v[key] = body
    c.mux.Unlock()
}

// Value returns mapped value for the given key.
func (c *SafeUrlMap) Value(key string) (string, bool) {
    c.mux.Lock()
    // Lock so only one goroutine at a time can access the map c.v.
    defer c.mux.Unlock()
    val, ok := c.v[key]
    return val, ok
}

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, urlMap SafeUrlMap) {
    defer wg.Done()
    urlMap.Set(url, body)

    if depth <= 0 {
        return
    }

    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Println(err)
        return
    }

    for _, u := range urls {
        if _, ok := urlMap.Value(u); !ok {
            wg.Add(1)
            go Crawl(u, depth-1, fetcher, urlMap)
        }
    }

    return
}

var wg sync.WaitGroup

func main() {
    urlMap := SafeUrlMap{v: make(map[string]string)}

    wg.Add(1)
    go Crawl("http://golang.org/", 4, fetcher, urlMap)
    wg.Wait()

    for url := range urlMap.v {
        body, _ := urlMap.Value(url)
        fmt.Printf("found: %s %q\n", url, body)
    }
}

This solution makes use of a sync.WaitGroup. There is no direct analogue in Java, but we can pretty easily make something that has similar semantics.

I am taking the implementation from this StackOverflow question.

final class WaitGroup {
    private int jobs = 0;

    public synchronized void add(int i) {
        jobs += i;
    }

    public synchronized void done() {
        if (--jobs == 0) {
            notifyAll();
        }
    }

    public synchronized void await() 
            throws InterruptedException {
        while (jobs > 0) {
            wait();
        }
    }
}

The SafeUrlMap type is also fairly trivial to assemble, but instead of that I am going to pass around a normal HashSet and manually synchronize on it.

There are many other options, including using Collections#synchronizedSet or wrapping a ConcurrentHashMap, but I think this will be the easiest to follow.

public final class WebCrawler {
    private WebCrawler() {
    }

    static void crawlTask(
            String url,
            int depth,
            Fetcher fetcher,
            ExecutorService executor,
            Set<String> seen,
            WaitGroup waitGroup
    ) {
        try {
            if (depth <= 0) {
                return;
            }

            Fetcher.Result result;
            try {
                result = fetcher.fetch(url);
            } catch (FetcherException e) {
                System.out.println(e.getMessage());
                return;
            }

            var body = result.body();
            var urls = result.urls();

            System.out.printf(
                "Found: %s %s\n", 
                body, 
                Arrays.toString(urls)
            );

            for (var u : urls) {
                synchronized (seen) {
                    if (!seen.contains(u)) {
                        seen.add(u);
                        waitGroup.add(1);
                        executor.submit(() -> crawlTask(
                                u,
                                depth - 1,
                                fetcher,
                                executor,
                                seen,
                                waitGroup
                        ));
                    }
                }
            }
        } finally {
            waitGroup.done();
        }
    }

    static void crawl(String url, int depth, Fetcher fetcher)
            throws InterruptedException {
        try (var executor =
                     Executors.newVirtualThreadPerTaskExecutor()) {
            var waitGroup = new WaitGroup();
            waitGroup.add(1);

            executor.submit(() -> crawlTask(
                    url,
                    depth,
                    fetcher,
                    executor,
                    new HashSet<>(),
                    waitGroup
            ));

            waitGroup.await();
        }
    }

    public static void main(String[] args) 
            throws InterruptedException {
        var fetcher = FakeFetcher.example();
        crawl("https://golang.org/", 4, fetcher);
    }
}
WebCrawler.main(new String[]{});

We could stop there, but there is one api that is in an incubator module in the early access builds that can replace our home rolled WaitGroup.

This is why I had --add-modules=jdk.incubator.concurrent in the jshell command up top.

A StructuredTaskScope.ShutdownOnFailure lets us submit an arbitrary number of tasks to the scope recursively and will only close after either all those tasks are complete. There is another implementation StructuredTaskScope.ShutdownOnSuccess that will finish after a single task succeeds.

This obviates the need to manually count up and down with a WaitGroup.

import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import jdk.incubator.concurrent.StructuredTaskScope;

public final class WebCrawler {
    private WebCrawler() {
    }

    static void crawlTask(
            String url,
            int depth,
            Fetcher fetcher,
            StructuredTaskScope.ShutdownOnFailure
                    structuredTaskScope,
            Set<String> seen
    ) {
        if (depth <= 0) {
            return;
        }

        Fetcher.Result result;
        try {
            result = fetcher.fetch(url);
        } catch (FetcherException e) {
            System.out.println(e.getMessage());
            return;
        }

        var body = result.body();
        var urls = result.urls();

        System.out.printf(
            "Found: %s %s\n", 
            body, 
            Arrays.toString(urls)
        );

        for (var u : urls) {
            synchronized (seen) {
                if (!seen.contains(u)) {
                    seen.add(u);
                    structuredTaskScope.fork(() -> {
                        crawlTask(
                                u,
                                depth - 1,
                                fetcher,
                                structuredTaskScope,
                                seen
                        );
                        return null;
                    });
                }
            }
        }

    }

    static void crawl(String url, int depth, Fetcher fetcher)
            throws InterruptedException {
        try (var structuredTaskScope =
                     new StructuredTaskScope.ShutdownOnFailure()) {

            structuredTaskScope.fork(() -> {
                crawlTask(
                        url,
                        depth,
                        fetcher,
                        structuredTaskScope,
                        new HashSet<>()
                );
                return null;
            });

            structuredTaskScope.join();
        }
    }

    public static void main(String[] args) 
            throws InterruptedException {
        var fetcher = FakeFetcher.example();
        crawl("https://golang.org/", 4, fetcher);
    }
}
WebCrawler.main(new String[]{});

This combines the role of the ExecutorService and the WorkGroup into one object which at the least makes for one less point of coordination and slightly cleaner code.

The exact shape the StructuredTaskScope api will take is very much in flux so if you are reading this a year or so in the future this snippet might not work.

Wrapping up

Hopefully this was informative. If not thats fine too.

Leave questions, comments, and suggestions in the comments.


<- Index

Java Serialization is Fun

by: Ethan McCue

What is serialization

Serialization is the process of taking an in-memory representation of data and transforming it to a representation suitable for sending to another location.

Diagram showing the serialization flow

Deserialization is the reverse of that process. Code takes a structured representation of data from some location and transforms it to a representation in-memory.

Diagram showing the deserialization flow

Every programming language has a myriad of approaches for performing these tasks. These approaches vary greatly depending on the semantics of the language, the semantics of the output format, and the culture surrounding both.

What sets Java's serialization mechanism apart is that the semantics of the language map extremely closely to that of the output format.

To fully appreciate the implications of this, allow me to take you on a bit of a tour of some other data formats.

CSV

CSV, Comma Separated Values, is one of the most "basic" data formats out there.

Data is written one line at a time, with each value in a "row" separated by commas.

frankie,25,yes,Jun 8, 2023
casca,63,no,none

By convention sometimes the very first row is used to store a "label" of what each "column" means.

First Name,Number of Cats,Tax Fraud?,Upcoming Court Date
frankie,25,yes,Jun 8, 2023
casca,63,no,none

While labels can add contextual information, the actual "data model" that is directly encoded here is just rows of strings. Interpretation of these rows is dependent on a combination of convention and "out of band" information.

CSV is
- a list of Rows

A Row is
- a list of strings

CSV is popular in quite a few domains. It's easy to import and export to Spreadsheets, write out from sensors on an Arduino, and feed into Machine Learning libraries.

But its data model is not close to how most programs represent data. To go from a representation in memory to CSV is most always going to be a "lossy" process. To go from CSV back to that same representation in memory is requires knowledge about how to interpret the order of elements in a row, what each element means, etc.

import java.time.LocalDate;

record Person(
        // have to assume that the first element is the name
        String name,
        // have to assume that the second element is this
        int numberOfCats,
        // How should a boolean be encoded?
        boolean taxFraud,
        // What format is the date in?
        // What is done when no value is known?
        LocalDate upcomingCourtDate
) {
    static Person fromCsvRow(List<String> row) {
        // Code here could be autogenerated if you assume
        // conventions, but it probably won't be
         
        if (row.size() != 4) { ... }
         
        String name = row.get(0);
        
        int numberOfCats;
        try {
            numberOfCats = Integer.parseInt(row.get(1));
        } catch (NumberFormatException __) {
             ...
        }
         
        // ... and so on ...
         
        return new Person(name, numberOfCats, ...);
    }
    
    List<String> toCsvRow() {
        // ...
        return List.of(this.name, ...);
    }
}

JSON

"JavaScript Object Notation" is a format derived from the syntax of declaring object literals in JavaScript.

{ 
     "stockName": "IDK",
     "stockPrice": "100USD",
     "twitterComments": [
          {
               "retweets": 10,
               "text": "...",
          },
          {
               "retweets": 20,
               "text": "..."
          }
     ]
}

Compared to CSV it is way more expressive. Instead of just rows of strings the data model includes dedicated representations for booleans, numbers, lists, and more.

JSON is one of
- null
- a string
- a number
- a boolean
- a list of JSON
- a map of string to JSON

This makes it somewhat of a "lowest common denominator" data format. Most modern languages have support for these data types and the structure can represent nested data much more ergonomically than "flat" formats like CSV.

The translation from a model in memory to JSON is still "lossy" in quite a few common cases though.

record Recruiter(
        // Often enums will be translated to Strings
        TellsYouTheSalary tellsYouTheSalary,
        // Times might be put into a ISO-8601 format String 
        // or a Unix Time integer
        Instant postedFirstCringeStatus,
        // Sets aren't representable, so often
        // they will be encoded as lists
        Set<ReservationsAtDorsier> reservations,
        // Multiple possibilities with overlapping fields need a
        // convention for representing which is present
        LovedOne lovedOne
) {}

enum TellsYouTheSalary {
     UP_FRONT,
     IF_YOU_ASK,
     NEVER
}

sealed interface LovedOne {}
record Cat(String name) implements LovedOne {}
record Dog(String name) implements LovedOne {}
record NoOne() implements LovedOne {}

// Both of these would be valid representations 
// depending on your conventions
//
// { "tellsYouTheSalary": "UP_FRONT",
//   "postedFirstCringeStatus": 1234,
//   "reservations": [],
//   "lovedOne": {"type": "cat", "name": "fred" } }
//
// { "tells_you_the_salary": "up_front",
//   "posted_first_cringe_status": "2020-07-10 15:00:00.000",
//   "reservations": {"kind": "set", "contents": []},
//   "loved_one": {"kind": "cat", "name": "fred"} } 

EDN

"Extensible Data Notation" is a format that came out of the syntax of the Clojure programming language.

{ 
     :teethLeft       #{5 12 14 23}
     :countryOfOrigin "United States of America"
     :whelped         #inst "2006-04-12T00:00:00.000-00:00"
     :parents         #{#pokemon "Skitty" 
                        #pokemon "Wailord"}
     :moves           [:quick-attack :tail-whip]
}

More likely than not you have not heard of it. That's a shame because it's pretty cool.

Compared to JSON it has a larger base set of types and a defined mechanism for extending that set.

EDN is one of
- null
- a string
- an integer
- a vector of EDN
- a map of EDN to EDN
- a set of EDN
- a keyword
- a symbol
- an element with a tag and an EDN value

... and a few other base types ...

The key capability for the purposes of this discussion is that you are able to attach an arbitrary tag to any EDN value.

This serves the same purpose as the { "type": ..., "data": ... } pattern in JSON, but by virtue of being part of the format that encoding is not "positional".

As an example of what I mean, in JSON the way you know that a given field contains a moment in time is by knowing implicitly that the string under a specific name like "createdAt" will be formatted in as a timestamp.

{ "createdAt": "2020-08-12T00:00:00.000-00:00" }

In EDN if you know how a given tag like #inst should be interpreted then you can automatically do that interpretation no matter where in the structure of the document it appears.

{ "createdAt" #inst"2020-08-12T00:00:00.000-00:00" }

This means that translation to and from EDN doesn't have to be lossy in the same way JSON serialization is. If you have a custom aggregate, you can define a tag for that aggregate and include whatever data is needed to reconstruct it

package some.pack;

sealed interface Mascot {}
record Gecko(int age) {}
record Sailor(int age, boolean captain) {}

// This could be encoded as
// #some.pack.Gecko{:age 12}
// #some.pack.Sailor{:age 35 :captain true}

You can also have non-string keys {{:map "key"} "whatever value"}. Y'all are missing out.

Java's Serialization Format

"Java Serialization" is a mechanism by which any object in memory can be serialized to and deserialized from a sequence of bytes while preserving the same semantics that object had in memory.

For regular classes, it accomplishes this by recursively scraping the fields of the class and producing bytes as specified here. Then when the bytes are read back in, it reconstructs the object by doing the reverse.

For "special" classes (Strings, Enums, and Records) there are slightly different rules, but the effect is essentially the same.

This is exceedingly hard to properly communicate with words, so here is a quick walk-through.

You can follow along by pasting each snippet into JShell.

(If you have Java installed, run jshell on the command line)

Step 1. Make a Serializable class

Implement the Serializable marker interface and make sure every field of your class does as well or is a primitive.

import java.io.Serializable;

public class LabeledPosition implements Serializable {
    private String label;
    private int x;
    private int y;
    
    public LabeledPosition(String label, int x, int y) {
        this.label = label;
        this.x = x;
        this.y = y;
    }
    
    @Override
    public String toString() {
        return "LabeledPosition[label=" + this.label +
                ", x=" + this.x +
                ", y=" + this.y +
                "]";
    }
}

Step 2. Make an ObjectOutputStream

You can make this special class by wrapping any existing OutputStream. This is where the bytes of your serialized form will be written.

import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;

var byteArrayOutputStream = new ByteArrayOutputStream();
var objectOutputStream = new ObjectOutputStream(
        byteArrayOutputStream
);

Step 3. Write your object to the ObjectOutputStream

This is a binary format, so there isn't any fun visual aid, but you can inspect and see that indeed we have written some bytes.

objectOutputStream.writeObject(new LabeledPosition("bob", 9, 1));

byte[] bytes = byteArrayOutputStream.toByteArray();
System.out.println(Arrays.toString(bytes));
// [-84, -19, 0, ..., 98, 111, 98]

Step 4. Create an ObjectInputStream

This is very similar to how we wrote the object out. Wrap any existing InputStream.

import java.io.ByteArrayInputStream;
import java.io.ObjectInputStream;

var byteArrayInputStream = new ByteArrayInputStream(bytes);
var objectInputStream = new ObjectInputStream(byteArrayInputStream);

Step 5. Read in the object you wrote out

var labeledPosition = 
        (LabeledPosition) objectInputStream.readObject();

System.out.println(labeledPosition);
// LabeledPosition[label=bob, x=9, y=1]

Step 6. Make another Serializable class

Hold with me here, this gets good.

record TwoLists(
     List<Integer> listOne,
     List<Integer> listTwo
) implements Serializable {}

Step 7. Make a mutable object

So here we will make an instance of this TwoLists record where each List is the exact same list in memory.

This means that if we add to either listOne or listTwo both will be updated.

var theList = new ArrayList<>(List.of(1, 2, 3));
var twoLists = new TwoLists(theList, theList);

System.out.println(twoLists);
// TwoLists[listOne=[1, 2, 3], listTwo=[1, 2, 3]]

twoLists.listOne().add(4);
System.out.println(twoLists);
// TwoLists[listOne=[1, 2, 3, 4], listTwo=[1, 2, 3, 4]]

Step 8. Write that mutable object to an ObjectOutputStream

var byteArrayOutputStream = new ByteArrayOutputStream();
var objectOutputStream = new ObjectOutputStream(
        byteArrayOutputStream
);
objectOutputStream.writeObject(twoLists);
byte[] bytes = byteArrayOutputStream.toByteArray();

Step 9. Read that mutable object from an ObjectInputStream

var byteArrayInputStream = new ByteArrayInputStream(bytes);
var objectInputStream = new ObjectInputStream(byteArrayInputStream);

var roundTripped = (TwoLists) objectInputStream.readObject();

Step 10. Oh no

Oh yeah.

System.out.println(roundTripped);
// TwoLists[listOne=[1, 2, 3, 4], listTwo=[1, 2, 3, 4]]

System.out.println(roundTripped.listOne() == roundTripped.listTwo());
// true

roundTripped.listOne().add(5);
System.out.println(roundTripped);
// TwoLists[listOne=[1, 2, 3, 4, 5], listTwo=[1, 2, 3, 4, 5]]

If you have the same object two places in the "object graph" of something you are serializing, the fact that those two places hold the same object is preserved.

Because of this, you can even seamlessly serialize things like circular linked lists.

class CircularThing implements Serializable {
    CircularThing next;
}

// How would you write this in JSON?
var circular = new CircularThing();
circular.next = circular;

What is this good for?

  • Prototypes and Projects on a tight deadline.

Since you can save any arbitrary object and there is no extra code needed to make that just "work", Java Serialization can be a very useful crutch for getting code working quickly.

  • Saving arbitrary objects that were very expensive to create

In the Python world, a similar utility is often used to save the results of training ML models. It's easy to imagine that Java Serialization could see similar use if Data Science ever took off on the JVM in the same way.

  • Dynamically sending code to another machine

Spark uses this mechanism for distributing Java objects across different nodes.

What is this bad for?

  • Models that will change over time

While you can version serialized objects, doing so is non-obvious and error-prone. Making a class serializable, especially in a library, can therefore be a fairly large maintenance problem.

  • Applications where an untrusted entity might be able to send data to your application

If you read serialized data that you did not write, that is a giant security hole. There is more nuance to it, but basically if you read untrusted serialized data then any hacker can get full access to your system. I'm not going to go in to every way you can exploit serialization, but this talk should give you a basic idea.

This was a crucial part of the Log4Shell vulnerability.

  • Things that should be edited by humans like configuration files

Because serialized objects are stored in a binary format, it is impossible to read without special tooling and prohibitively hard to write by hand.

  • Sending data outside the Java/JVM world

While technically you could write a parser for the binary format in your language of choice and recover the information, you would likely be the first. If you need to share values with programs in other languages, falling back to a "lowest common denominator" like JSON is a better strategy.


Part of what made writing this so hard for me is that most people who I've seen be shown serialization were shown it very early in their curriculums. It's hard to explain nuance around the object model and encapsulation when talking to someone who learned what classes are two weeks back, so I left most of that out.

Leave a comment below if anything was unclear, incorrect, or you would like to learn more.


<- Index

Java's options for options

by: Ethan McCue

Hypothetical 1

Imagine that you made a bit of code that outputs JSON.

public final class JsonWriter {
    private JsonWriter() {}
    
    public static void writeJson(
            Appendable out,
            Json json
    ) {
        ...
    }
}

By default your output contains no extra whitespace, but you want to provide an option to the user to print that JSON with some indentation.

Without indentation

[{"name":"joe","age":35}]

With indentation

[
    {
        "name":"joe",
        "age":35
    }
]

Option 1. Don't support it

Any toggles you add to your api are toggles you might need to support now and forever more. Depending on the code you are writing and who its consumers are, it might make more sense to provide a more restricted api.

Option 2. Make another method

With only a single option you want configurable, you can just expose a method with a different name.

public final class JsonWriter {
    private JsonWriter() {}
    
    public static void writeJson(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonWithIndentation(
            Appendable out,
            Json json
    ) {
        ...
    }
}
writeJson(out, json);
writeJsonWithIndentation(out, json);

Option 3. Add a boolean argument

A single option is either on or off. True or false. That is often the domain of a boolean.

public final class JsonWriter {
    private JsonWriter() {}
    
    public static void writeJson(
            Appendable out,
            Json json,
            boolean indent
    ) {
        if (indent) {
            ...
        }
        else {
            ...
        }
    }
}
writeJson(out, json, true);
writeJson(out, json, false);

Option 4. Add an enum argument

Booleans are great, but for understandability at the call-site you might want to provide an enum with two possible values instead.

public enum Indentation {
    INDENT,
    DO_NOT_INDENT
}

...

public final class JsonWriter {
    private JsonWriter() {}
    
    public static void writeJson(
            Appendable out,
            Json json,
            Indentation indent
    ) {
        switch (indent) {
            case INDENT -> ...
            case DO_NOT_INDENT -> ...
        }
    }
}
writeJson(out, json, Indentation.INDENT);
writeJson(out, json, Indentation.DO_NOT_INDENT);

Hypothetical 2

Say now you get some feedback that while the indentation style is great for objects, it is sometimes not great for JSON with long arrays.

No indentation

[{"numbers":[1,2,3]}]

Indent Everything

[
  {
    "numbers": [
        1,
        2,
        3
    ]
  }
]

Indent Objects

[{
    "numbers": [1, 2, 3]
}]

Indent Arrays

[
  {"numbers": [
        1,
        2,
        3
  ]}
]

Option 5. Make methods for requested combinations

Your users just want a way to turn off indentation for arrays. Give it to them.

public final class JsonWriter {
    private JsonWriter() {}
    
    public static void writeJson(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonIndentObjects(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonIndentEverything(
            Appendable out,
            Json json
    ) {
        ...
    }
}
writeJson(out, json);
writeJsonIndentObjects(out, json);
writeJsonIndentEverything(out, json);

Option 6. Make methods for every combination

There are four logical settings that come out of two different flags, so you can certainly provide all four options as methods. Could save you time later.

public final class JsonWriter {
    private JsonWriter() {}
    
    public static void writeJson(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonIndentObjects(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonIndentArrays(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonIndentEverything(
            Appendable out,
            Json json
    ) {
        ...
    }
}
writeJson(out, json);
writeJsonIndentObjects(out, json);
writeJsonIndentArrays(out, json);
writeJsonIndentEverything(out, json);

Option 7. Have two boolean arguments

Two logically independent things to configure, you can always take a boolean for each.

public final class JsonWriter {
    private JsonWriter() {}
    
    public static void writeJson(
            Appendable out,
            Json json,
            boolean indentObjects,
            boolean indentArrays
    ) {
        if (indentObjects) {
            if (indentArrays) {
                ...
            }
            else {
                ...
            }
        }
        else {
            ...
        }
    }
}
writeJson(out, json, true, true);
writeJson(out, json, true, false);
writeJson(out, json, false, true);
writeJson(out, json, false, false);

Option 8. Have two enum arguments

Booleans describe everything, but enums are still more explicit.

public enum Indentation {
    INDENT,
    DO_NOT_INDENT
}

...

public final class JsonWriter {
    private JsonWriter() {}
    
    public static void writeJson(
            Appendable out,
            Json json,
            Indentation indentObjects,
            Indentation indentArrays
    ) {
        switch (indentObjects) {
            case INDENT -> switch (indentArrays) {
                case INDENT -> ...
                case DO_NOT_INDENT -> ...
            }
            case DO_NOT_INDENT -> ...
        }
    }
}
writeJson(out, json, Indentation.INDENT, Indentation.INDENT);
writeJson(out, json, Indentation.INDENT, Indentation.DO_NOT_INDENT);
writeJson(out, json, Indentation.DO_NOT_INDENT, Indentation.INDENT);
writeJson(
        out, 
        json, 
        Indentation.DO_NOT_INDENT, 
        Indentation.DO_NOT_INDENT
);

Option 9. Take options as bit flags

It's an old school solution and maybe a bit too clever, but you are feeling old school and clever.

public final class Indentation {
    public static final int NO_INDENTATION = 0b00;
    public static final int INDENT_OBJECTS = 0b01;
    public static final int INDENT_ARRAYS = 0b10;
    
    private Indentation() {}
}

...

public final class JsonWriter {
    private JsonWriter() {}
    
    public static void writeJson(
            Appendable out,
            Json json,
            int indentation
    ) {
        if ((indentation & Indentation.INDENT_OBJECTS) != 0) {
            if ((indentation & Indentation.INDENT_ARRAYS) != 0) {
                ...
            }
            else {
                ...
            }
        }
        else {
            ...
        }
    }
}
writeJson(
        out, 
        json, 
        Indentation.INDENT_OBJECTS | Indentation.INDENT_ARRAYS, 
);
writeJson(out, json, Indentation.INDENT_OBJECTS);
writeJson(out, json, Indentation.INDENT_ARRAYS);
writeJson(out, json, Indentation.NO_INDENTATION);

Option 10. Take an EnumSet

Rather than waste a parameter on each flag, explicitly take the set of behaviors they want to enable.

public enum Indent {
    OBJECTS,
    ARRAYS
}

...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            EnumSet<Indent> indent
    ) {
        if (indent.contains(Indent.OBJECTS)) {
            if (indent.contains(Indent.ARRAYS)) {
                ...
            }
            else {
                ...
            }
        }
        else {
            ...
        }
    }
}
writeJson(out, json, EnumSet.of(Indent.OBJECTS, Indent.ARRAYS));
writeJson(out, json, EnumSet.of(Indent.OBJECTS));
writeJson(out, json, EnumSet.of(Indent.ARRAYS));
writeJson(out, json, EnumSet.noneOf(Indent.class));

Option 11. Take a transparent config object

Similar to just taking two booleans, but putting them in an object means you can refer to a set of options as a concrete "thing". This could help you keep the most common usages terse.

record Options(boolean indentObjects, boolean indentArrays) {
    public static final Options INDENT_EVERYTHING =
            new Options(true, true);
    public static final Options NO_INDENT =
            new Options(false, false);
}

...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            Options options
    ) {
        if (options.indentObjects()) {
            if (options.indentArrays()) {
                ...
            }
            else {
                ...
            }
        }
        else {
            ...
        }
    }
}
writeJson(out, json, Options.INDENT_EVERYTHING);
writeJson(out, json, new Options(true, false));
writeJson(out, json, new Options(false, true));
writeJson(out, json, Options.NO_INDENT);

Option 12. Take an opaque config object

Maybe you want to give your api some extra wiggle room to grow. Maybe you just like how the usage of an opaque object made from a builder looks.

With this approach you can choose to internally represent things as booleans, enums, an enum set, bit flags, or whatever other evil lies within the hearts of mankind.

public final class Options {
    private final boolean indentObjects;
    private final boolean indentArrays;

    private Options(Builder builder) {
        this.indentArrays = builder.indentArrays;
        this.indentObjects = builder.indentObjects;
    }

    public boolean indentArrays() {
        return this.indentArrays;
    }

    public boolean indentObjects() {
        return this.indentObjects;
    }

    public static Options standard() {
        return builder().build();
    }
    
    public static Builder builder() {
        return new Builder();
    }

    public final class Builder {
        private boolean indentObjects;
        private boolean indentArrays;

        private Builder() {
            this.indentObjects = false;
            this.indentArrays = false;
        }
        
        public Builder indentObjects() {
            this.indentObjects = true;
            return this;
        }
        
        public Builder indentArrays() {
            this.indentArrays = true;
            return this;
        }
        
        public Options build() {
            return new Options(this);
        }
    }
}

...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            Options options
    ) {
        if (options.indentObjects()) {
            if (options.indentArrays()) {
                ...
            }
            else {
                ...
            }
        }
        else {
            ...
        }
    }
}
writeJson(
    out, 
    json, 
    Options.builder()
        .indentObjects()
        .indentArrays()
        .build()
);
writeJson(
    out, 
    json, 
    Options.builder()
        .indentObjects()
        .build()
);
writeJson(
    out, 
    json, 
    Options.builder()
        .indentArrays()
        .build()
);
writeJson(out, json, Options.standard());

Hypothetical 3.

🚑 Weewoo Weewoo 🚑

Its legal! Your apis are great and all, but when you send data to external clients we would really like to include an explicit statement of copyright. That copyright message might change depending on your contract with the client and also we shouldn't send it internally.

Good luck, legal out.

[
    {
        "name": "joe",
        "age": 35
    }
]
{ 
    "copyright": "(c) 2022 Inc.",
    "data": [
        {
            "name":"joe",
            "age":35
        }
    ]
}

Option 13. Add 4 more methods to hit the new combinations

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonIndentObjects(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonIndentArrays(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonIndentEverything(
            Appendable out,
            Json json
    ) {
        ...
    }

    public static void writeJsonWithCopyright(
            Appendable out,
            Json json,
            String copyright
    ) {
        ...
    }

    public static void writeJsonIndentObjectsWithCopyright(
            Appendable out,
            Json json,
            String copyright
    ) {
        ...
    }

    public static void writeJsonIndentArraysWithCopyright(
            Appendable out,
            Json json,
            String copyright
    ) {
        ...
    }

    public static void writeJsonIndentEverythingWithCopyright(
            Appendable out,
            Json json,
            String copyright
    ) {
        ...
    }
}
writeJson(out, json);
writeJsonIndentObjects(out, json);
writeJsonIndentArrays(out, json);
writeJsonIndentEverything(out, json);
writeJsonWithCopyright(out, json, "(c) 2022");
writeJsonIndentObjectsWithCopyright(out, json, "(c) 2022");
writeJsonIndentArraysWithCopyright(out, json, "(c) 2022");
writeJsonIndentEverythingWithCopyright(out, json, "(c) 2022");

Options 14. Add a single new method

If your boolean-like options only took up a single overload, you can get away with just adding a single new method to the list.

This will look different depending on whether you used booleans, enums, an EnumSet, or bit flags.

public enum Indent {
    OBJECTS,
    ARRAYS
}

...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            EnumSet<Indent> indent
    ) {
        
    }

    public static void writeJson(
            Appendable out,
            Json json,
            EnumSet<Indent> indent,
            String copyright
    ) {

    }
}
writeJson(out, json, EnumSet.of(Indent.OBJECTS, Indent.ARRAYS));
writeJson(out, json, EnumSet.of(Indent.OBJECTS));
writeJson(out, json, EnumSet.of(Indent.ARRAYS));
writeJson(out, json, EnumSet.noneOf(Indent.class));
writeJson(
    out, 
    json, 
    EnumSet.of(Indent.OBJECTS, Indent.ARRAYS),
    "(c) 2022"
);
writeJson(
    out, 
    json, 
    EnumSet.of(Indent.OBJECTS),
    "(c) 2022"
);
writeJson(
    out, 
    json, 
    EnumSet.of(Indent.ARRAYS),
    "(c) 2022"
);
writeJson(
    out, 
    json, 
    EnumSet.noneOf(Indent.class), 
    "(c) 2022"
);

Option 15. Add another argument and accept null

If you don't want to add yet another overload, you can always just allow users to pass null.

public enum Indent {
    OBJECTS,
    ARRAYS
}

...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            EnumSet<Indent> indent,
            String copyright
    ) {
        if (indent.contains(Indent.OBJECTS)) {
            if (indent.contains(Indent.ARRAYS)) {
                if (copyright == null) {
                    ...
                }
                else {
                    ... 
                }

            }
            else {
                ...
            }
        }
        else {
            ...
        }
    }
}
writeJson(out, json, EnumSet.of(Indent.OBJECTS, Indent.ARRAYS), null);
writeJson(out, json, EnumSet.of(Indent.OBJECTS), null);
writeJson(out, json, EnumSet.of(Indent.ARRAYS), null);
writeJson(out, json, EnumSet.noneOf(Indent.class), null);
writeJson(
        out,
        json,
        EnumSet.of(Indent.OBJECTS, Indent.ARRAYS), 
        "(c) 2022"
);
writeJson(out, json, EnumSet.of(Indent.OBJECTS), "(c) 2022");
writeJson(out, json, EnumSet.of(Indent.ARRAYS), "(c) 2022");
writeJson(out, json, EnumSet.noneOf(Indent.class), "(c) 2022");

Option 16. Add another argument and make it be an Optional

It's time to paint a bike shed. You can do this with java.util.Optional or your own sealed type.

public enum Indent {
    OBJECTS,
    ARRAYS
}

...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            EnumSet<Indent> indent,
            Optional<String> copyright
    ) {
        if (indent.contains(Indent.OBJECTS)) {
            if (indent.contains(Indent.ARRAYS)) {
                if (copyright.isPresent()) {
                    ... copyright.orElseThrow() ...
                } else {
                    ...
                }
            } else {
                ...
            }
        } else {
            ...
        }
    }
}
writeJson(
    out, 
    json, 
    EnumSet.of(Indent.OBJECTS, Indent.ARRAYS), 
    Optional.empty()
);
writeJson(
    out,
    json, 
    EnumSet.of(Indent.OBJECTS),
    Optional.empty()
);
writeJson(
    out, 
    json, 
    EnumSet.of(Indent.ARRAYS),
    Optional.empty()
);
writeJson(
    out, 
    json, 
    EnumSet.noneOf(Indent.class), 
    Optional.empty()
);
writeJson(
    out, 
    json, 
    EnumSet.of(Indent.OBJECTS, Indent.ARRAYS), 
    Optional.of("(c) 2022")
);
writeJson(
    out, 
    json, 
    EnumSet.of(Indent.OBJECTS), 
    Optional.of("(c) 2022")
);
writeJson(
    out, 
    json, 
    EnumSet.of(Indent.ARRAYS), 
    Optional.of("(c) 2022")
);
writeJson(
    out, 
    json, 
    EnumSet.noneOf(Indent.class), 
    Optional.of("(c) 2022")
);

Option 17. Add a nullable property to a transparent config object

record Options(
        boolean indentObjects, 
        boolean indentArrays,
        String copyright) {
    public static final Options INDENT_EVERYTHING =
            new Options(true, true, null);
    public static final Options NO_INDENT =
            new Options(false, false, null);
}

...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            Options options
    ) {
        if (options.indentObjects()) {
            if (options.indentArrays()) {
                if (options.copyright() == null) {
                    ...
                }
                else {
                    ...   
                }
            }
            else {
                ... 
            }
        }
        else { 
            ... 
        }
    }
}

Option 18. Add an Optional property to a transparent config object

Same as above, but if you like to explicitly have the property be optional you can, but if you are using records as your transparent data carriers then you need to take an Optional in your constructor.

record Options(
        boolean indentObjects,
        boolean indentArrays,
        Optional<String> copyright
) {
    public static final Options INDENT_EVERYTHING =
            new Options(true, true, Optional.empty());
    public static final Options NO_INDENT =
            new Options(false, false, Optional.empty());
}

Option 19. Add another property to an opaque config object

public final class Options {
    private final boolean indentObjects;
    private final boolean indentArrays;
    private final String copyright;

    private Options(Builder builder) {
        this.indentArrays = builder.indentArrays;
        this.indentObjects = builder.indentObjects;
        this.copyright = builder.copyright;
    }

    public boolean indentArrays() {
        return this.indentArrays;
    }

    public boolean indentObjects() {
        return this.indentObjects;
    }

    public Optional<String> copyright() {
        return Optional.ofNullable(this.copyright);
    }

    public static Options standard() {
        return builder().build();
    }

    public static Builder builder() {
        return new Builder();
    }

    public final class Builder {
        private boolean indentObjects;
        private boolean indentArrays;
        private String copyright;

        private Builder() {
            this.indentObjects = false;
            this.indentArrays = false;
            this.copyright = null;
        }

        public Builder indentObjects() {
            this.indentObjects = true;
            return this;
        }

        public Builder indentArrays() {
            this.indentArrays = true;
            return this;
        }

        public Builder copyright(String copyright) {
            this.copyright = copyright;
            return this;
        }
        
        public Options build() {
            return new Options(this);
        }
    }
}
writeJson(
    out, 
    json, 
    Options.builder()
        .indentObjects()
        .indentArrays()
        .build()
);
writeJson(
    out, 
    json, 
    Options.builder()
        .indentObjects()
        .build()
);
writeJson(
    out, 
    json, 
    Options.builder()
        .indentArrays()
        .build()
);
writeJson(out, json, Options.standard());
writeJson(
    out, 
    json, 
    Options.builder()
        .indentObjects()
        .indentArrays()
        .copyright("(c) 2022")
        .build()
);
writeJson(
    out, 
    json, 
    Options.builder()
        .indentObjects()
        .copyright("(c) 2022")
        .build()
);
writeJson(
    out, 
    json, 
    Options.builder()
        .indentArrays()
        .copyright("(c) 2022")
        .build()
);
writeJson(
    out, 
    json, 
    Options.builder()
        .copyright("(c) 2022")
        .build()
);

Hypothetical 4.

It would be a lot more efficient if you started sending your JSON in a binary format like MessagePack. It has the same data model as JSON, so it should work out.

Also, when sending in that binary format there is a choice between "Little Endian" and "Big Endian".

Problem is, there really isn't a meaning to indentation in a binary format or to endianness in a text one.

Option 20. Make separate methods

For a split as fundamental as this, it might make sense to start to make an entirely separate API for the new JSON-like format.

public enum Indent {
    OBJECTS,
    ARRAYS
}

...
        
enum Endianness {
    BIG_ENDIAN,
    LITTLE_ENDIAN
}
  
...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            EnumSet<Indent> indent,
            Optional<String> copyright
    ) {
        ...
    }
    
    public static void writeMessagePack(
            Appendable out,
            Json json,
            Endianness endianness,
            Optional<String> copyright
    ) {
        ...
    }
}
writeJson(
        out, 
        json,
        EnumSet.of(Indent.OBJECTS),
        Optional.of("(c) 2022")
);
writeMessagePack(
        out,
        json,
        Endianness.BIG_ENDIAN,
        Optional.of("(c) 2022")
);

Option 21. Make an interface and use dispatch

You were surprised you didn't think of this first. Dynamic dispatch is some classic Java stylings.

public interface JsonWriter {
    void write(Appendable out, JSON json);
}

...

public final class TextJsonWriter implements JsonWriter {
    private final boolean indentObjects;
    private final boolean indentArrays;
    private final String copyright;
    
    public TextJsonWriter(
            boolean indentObjects, 
            boolean indentArrays,
            String copyright
    ) {}
    
    @Override
    public void write(Appendable out, JSON json) {
        ...
    }
}

...

enum Endianness {
    BIG_ENDIAN,
    LITTLE_ENDIAN
}

...

public final class BinaryJsonWriter implements JsonWriter {
    private final Endianness endianness;
    private final String copyright;

    public BinaryJsonWriter(
            Endianness endianness,
            String copyright
    ) {}
    
    @Override
    public void write(Appendable out, JSON json) {
        ...
    }
}
new BinaryJsonWriter(
        Endianness.BIG_ENDIAN,
        "(c) 2022"
).writeJson(out, json);

new TextJsonWriter(
        true,
        false,
        "(c) 2022"
).writeJson(out, json);

Option 22. Take everything as an object and figure it out at runtime.

You need to choose whether you silently ignore bad combinations of objects and what behaviors get preference, but there is a simplicity to just throwing it all into a record or opaque object and figuring it out from there.

enum Endianness {
    BIG_ENDIAN,
    LITTLE_ENDIAN
}

record Options(
        Boolean indentObjects, 
        Boolean indentArrays,
        String copyright,
        boolean useBinary,
        Endianness endianness
) {
    public static final Options INDENT_EVERYTHING =
            new Options(
                    true, 
                    true, 
                    null, 
                    false, 
                    null
            );
    public static final Options NO_INDENT =
            new Options(
                    false, 
                    false, 
                    null, 
                    false, 
                    null
            );
    public static final Options BINARY_LE = 
            new Options(
                    null,
                    null, 
                    null, 
                    true, 
                    Endianness.LITTLE_ENDIAN
            );
}

...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            Options options
    ) {
        if (options.useBinary() &&
                (options.indentArrays() != null 
                        || options.indentObjects() != null)) {
            // ignore or throw
            ...
        }
        else if (!options.useBinary() &&
                    options.endianness() != null) {
            ...
        }
        else {
            ...
        }
    }
}
writeJson(
        out, 
        json, 
        Options.INDENT_EVERYTHING
);
writeJson(
        out,
        json,
        Options.BINARY_LE
);
writeJson(
        out,
        json,
        new Options(
            null,
            null, 
            null,
            true, 
            Endianness.BIG_ENDIAN
        )
);

Option 23. Model valid choices in the type hierarchy.

With a bit of restructuring, you can actually make an Options object that will correctly handle having that disjoint set of options.

Maybe not what you would choose with 100 settings or more complicated legality restrictions, but for this case it all seems to work out.

enum Endianness {
    BIG_ENDIAN,
    LITTLE_ENDIAN
}

sealed interface Options permits BinaryOptions, TextOptions {
    Optional<String> copyright();
}

record TextOptions(
        @Override Optional<String> copyright,
        boolean indentObjects,
        boolean indentArrays
) implements Options {}

record BinaryOptions(
        @Override Optional<String> copyright,
        Endianness endianness
) implements Options {}

...

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            Options options
    ) {
        switch (options) {
            case TextOptions textOptions -> {
                ...
            }
            case BinaryOptions binaryOptions ->
                switch (binaryOptions.endianness()) {
                    case BIG_ENDIAN -> ...
                    case LITTLE_ENDIAN -> ...
                }
        }
    }
}
writeJson(
        out, 
        json, 
        new TextOptions(Optional.of("(c) 2022"), true, false)
);
writeJson(
        out,
        json,
        new BinaryOptions(Optional.empty(), Endianness.BIG_ENDIAN)
);

Option 24. Give up on typing it, just pass a map

This was always an option. It works just as well here as it does in a dynamic language, it's just a tad more verbose and unsafe.

public final class JsonWriter {
    private JsonWriter() {}

    public static void writeJson(
            Appendable out,
            Json json,
            Map<String, Object> options
    ) {
        var copyright = options.get("copyright");
        if (copyright == null) {
            ...
            var endianness = options.get("binary");
            ...
        }
        else if (copyright instanceof String copyrightString) {
            ...
        }
        else {
            throw new IllegalArgumentException(...);
        }
    }
}
writeJson(
        out, 
        json,
        Map.of(
            "indentObjects", true,
            "copyright", "(c) 2022"
        )
);
writeJson(
        out, 
        json,
        Map.of(
            "binary", true,
            "endianness", Endianness.BIG_ENDIAN
        )
);

Hypothetical 5.

You've taken the mouse to the movies. People don't need all the configuration options you've provided and don't like using the API that has them. They want a simpler API.

Maybe you should have gone with option 1.


Exercise for the reader. Make a spreadsheet of all these options versus all the criteria you use to evaluate software and fill in the grid with smiley faces, frowny faces, and meh faces. Feel free to fill in some options I missed, like reading from environment variables, system properties, or more inheritance schemes.


<- Index

Why Java didn't add flow typing

by: Ethan McCue

Java's method priority rules are part of the reason why pattern matching works the way it does.

Say you had code that you wrote in the past 30 years that looks like this

public class Example{

    public static void main(String[] args){
        Number i = 3;
        if (i instanceof Integer) {
            doTask(i);
        }
    }

    public static void doTask(Integer i){
        System.out.println("integer");
    }

    public static void doTask(Number i){
        System.out.println("number");
    }
}

This will output number even though i is an Integer because the stated type of i is Number.

In other languages like kotlin, if you check if i is an Integer, within the block of that if it will be considered an integer by the compiler.

Since java has legacy code that might have done something like this, you have to explicitly name a variable that will be considered an Integer after the check

Number number = 3;
if (number instanceof Integer integer) {
    doTask(integer); // will select the integer overload
    doTask(number); // will select the number overload
    // integer and number both refer to the same object
    // so this will output true
    System.out.println(integer == number);
}

While this might feel like a nuisance, structuring the language feature like this enables more generic "pattern matching" syntax that you can read about here (coming in the next few years) https://openjdk.java.net/jeps/405


<- Index

The switch that only I like

by: Ethan McCue

One thing that I think is really cool, but no one else cares about, is that if you make a sealed interface which only has one implementor (say, if you are enriching a class via generated code) then there is actually a way for you to make a "type safe cast" between the two things.

What i mean by this is, consider a normal cast

Integer i = 3;
Object o = i;
String s = (String) o; // boom, ClassCastException

Normal casts are allowed by the language to fail. If you did not properly track the types of things and you try to cast to a type you are not compatible with your code will still compile, but it will crash at runtime.

Even if you write an interface that has a single implementor, the language can't guarantee that later on there won't be another one added.

interface Thing {
    void exist();
}

class OnlyThingImpl implements Thing {
    int specialThing = 8;

    @Override
    public void exist() {}
}

...

// This code works perfectly well
Thing thing = ...;
OnlyThingImpl thingImpl = (OnlyThingImpl) thing;
System.out.println(thingImpl.specialThing);

...

// but if an intern adds another implementation, java won't warn you about
// the old usage and it will crash at runtime if you happen to call that code
// with the wrong implementation.
class OtherThingImpl implements Thing {
    @Override
    public void exist() {
        System.out.println("party");
    }
}

Even making the interface sealed doesn't solve this, since Java won't force you to handle other possibilities in your casts if another case is added to the sealed class

sealed interface Thing permits OnlyThingImpl {
    void exist();
}

...

// This still has the same problem - you might forget about this code
OnlyThingImpl thingImpl = (OnlyThingImpl) thing;

What can solve this is a "safe cast" done via a pattern matching switch.

OnlyThingImpl thingImpl = switch (thing) {
    case OnlyThingImpl __ -> __;
};

<- Index

Why you should care about Sealed Types

by: Ethan McCue

As of Java 17 (with --enable-preview) you can combine two features - pattern matching and sealed types - to represent and deal with cases where two things have different sets of data.

Say you want to represent three different kinds of items you can check out from a library.

A book, which has a title and author

record Book(String title, String author) {}

A CD, which has a runtime and genre

record CD(double runtime, String genre) {}

And a VHS tape, which contains precious memories

record VHSTape(List<Memory> preciousMemories) {}

What you can do to have a method which might return any one of these cases, or to have a list of any item in the library, is make a common interface that all three of them share

interface LibraryItem {}

record Book(String title, String author) implements LibraryItem {}

record CD(double runtime, String genre) implements LibraryItem {}

record VHSTape(List<Memory> preciousMemories) implements LibraryItem {}

The problem is that if you have a LibraryItem object, there isn't much you can actually do with it since the three cases of library items have different kinds of data and thus you cant access everything from the interface

LibraryItem item = ...;
item.title(); // nope
item.runtime(); // also nope
item.preciousMemories(); // you'll never get them back

To remedy this, you can make the interface sealed. This guarantees to the language that Book, CD, and VHSTape are the only classes which implement LibraryItem

sealed interface LibraryItem permits Book, CD, VHSTape {}

Now if you have an instance of library item, you can use it with a "pattern switch expression" to safely recover the actual type of the item

LibraryItem item = ...;
switch (item) {
    case Book book ->
         System.out.println(book.title());
    case CD cd ->
         System.out.println(cd.genre());
    case VHSTape tape ->
         System.out.println(tape.preciousMemories());
}

<- Index

Code generation with annotation processors

by: Ethan McCue

Java does not allow annotation processors to affect the source or bytecode of classes you have written, only generate new classes.

In Java 17, there is a clever way around this restriction though.

Lets say you want to make an annotation processor that adds a toJson method to a class based on some automated set of rules.

@AutoToJson
public record BasicThing(String color, String size) {}

What you can do is generate an interface with a predictable name

interface BasicThingToJson {}

Make it "sealed", so that only the class you want to can implement it

sealed interface BasicThingToJson permits BasicThing {}

And then inside of the interface you can add a default method, in which it is safe to assume that the only possible class that implements the interface is the class you want to

sealed interface BasicThingToJson permits BasicThing {
    default JSON toJson() {
        // totally safe
        var self = (BasicThing) this;
        var jsonObject = new JsonObject();
        jsonObject.set("color", self.color());
        jsonObject.set("size", self.size());
        return jsonObject;
    }
}

And then all your user has to do is

  1. Annotate their class
  2. Implement the interface that will be generated (its kinda circular, but it works out)
@AutoToJson
public record BasicThing(String color, String size) implements BasicThingToJson {}

And then boom, their class has been "enriched" in whatever way you want.

var basicThing = new BasicThing("red", "small");
var json = basicThing.toJson();

<- Index

Basics of Annotation Processors

by: Ethan McCue

If you register an implementation of javax.annotation.processing.Processor with the ServiceLoader mechanism then you can make what is called an "Annotation Processor".

Annotations are just metadata that can be put onto classes, fields, etc.

You can declare your own annotation like this

package some.pack;

@Target({ElementType.TYPE})
@Retention(RetentionPolicy.SOURCE)
public @interface YourAnnotation {
}

In this example the annotation can only be used on "types" like classes or interfaces and the metadata is only available on the source code.

So we can use that example annotation to mark a class of ours

@YourAnnotation
class SomeClass {}

And any annotation processors that the ServiceLoader can find at compile time can do stuff like generate brand new code, or add new compile time checks.

@SupportedAnnotationTypes("some.pack.MagicBean")
@SupportedSourceVersion(SourceVersion.RELEASE_17)
public final class AnnotationProcessor extends AbstractProcessor {
    @Override
    public boolean process(
            Set<? extends TypeElement> annotations,
            RoundEnvironment roundEnv
    ) {
        var filer = this.processingEnv.getFiler();
        var elements = roundEnv.getElementsAnnotatedWith(YourAnnotation.class);
        try {
            var file = filer.createSourceFile("brand.new.Code");
            try (var writer = file.openWriter()) {
                // there are libraries like javapoet which make doing this easier
                writer.append("""
                package brand.new;

                class Code {
                    public static final int NUMBER_OF_INSTANCES = %s;
                }
                """.formatted(elements.size()));
            }
            catch (IOException e) {
                throw new RuntimeException(e);
            } 
        }
        return true;
    }
}

Which in this case we generate a new class which has the number of annotated classes as a constant.


<- Index

The Service Loader Mechanism

by: Ethan McCue

If you have an interface or abstract class defined in some jar

package whatever.project;

public interface DoesThing {
   void doThing();
}

And in another jar you have one or more implementations of that interface which has a zero argument constructor

package something.other;

public final class DoesThingImpl implements DoesThing {
    @Override
    public void doThing() {
        System.out.println("I implemented this in a certain way");
    }
}

as well as a file in that jar with the interface name under META-INF/services

META-INF/services/whatever.project,DoesThing

which has a line that has the name of the implementing class

something.other.DoesThingImpl

Then you can obtain an implementation of that interface via the service loader mechanism

var loader = ServiceLoader.load(DoesThing.class);
for (var thingDoer : loader) {
    thingDoer.doThing();
}

This is some really core magic and is how most of the projects that want you to just add dependencies to get functionality - like slf4j, jdbc, twelvemonkeys, etc - use to do their thing


<- Index

Factories in FP

Question from Muhammad Hamza Chippa

How would you replace Factory design patterns in functional programming?

Before someone else says it - with a function

interface ThingFactory {
    Thing makeThing();
}

...

void work(ThingFactory factory) {
    var thing = factory.makeThing();
    thing.whatever();
}
(defn work [factory]
  (let [thing (factory)]
    (.whatever thing)))

You can also use the exact pattern as is (where the producer method gets a special name) with traits/typeclasses/protocols depending on your FP language


<- Index

Aliasing core functions in Clojure

Question from rmxm

Funny question, how do you cover for "get" word being a function name, say you have namespace 'computer, surely it would be intuitive to have "get" function obtaining "computer". Just wondering how do you overcome this linguitically 🙂 or do you just place get at the bottom

You are allowed to have a function named get - you just need to put (:refer-clojure :exclude [get]) in your namespace declaration to avoid warnings.

And then you refer to the core get as clojure.core/get within that file.

so

(ns some.computer
  (:refer-clojure :exclude [get]))

(defn get [o]
  (clojure.core/get o :thing))

or

(ns some.computer
  (:require [clojure.core :as core])
  (:refer-clojure :exclude [get]))

(defn get [o]
  (core/get o :thing))

sure, sure its not a technical question more of, do you use a different function name, or do you choose another word etc.

I do this with update in some contexts.

No I just use get if I want to - but you have to be more specific about the use case for me to give a better name.

You can usually call it get-thing or thing

If you mention both get and update in this case I will assume this is ok practice

its fine, depends on your domain

sure, sure, thanks 🙂

I could also get* or something like that for, "c/grud" stuff, get*, update*, delete*, create* seems, ok I think

if the intended usage of the namespace is :require [a.b.thing :as thing], thing/func and the ns is small and focused (enough that you wont forget you shadowed a core fn) then don’t worry about it

thanks, going the get* route, this less bother down the road i think


<- Index

How can you write this productSum algorithm written in JS in Java?

Question from u/warrior242

I want to write this algorithm from JS to Java and I am having some problem doing it because Java does not support parameter defaults and does not do dynamic inputs. I know this must be somehow possible and even better written in a typed language but I dont know how. Please help Java Gods.

The algorithm is supposed to go through the array, and add it up and multiply each part by the depth that its at.

So something like [1, 2]

would be something like

(1 * 1) + (2 * 1).

Something at depth [1, 2, [3, 4] ]

would be:

(1 * 1) + (2 * 1). + (( 3 + 4) *2)

As far as I understand anyways

Link for JS version of algorithm:

class ProductSum{
    constructor() {

    }

    solution(array, multiplier = 1) {
        let sum = 0;
        
        for (let i = 0; i < array.length; i++) {
            if (array[i] instanceof Array) {
                sum += this.solution (array[i], multiplier + 1)
            }

            else {
                sum = sum + array[i];
            }
        }
        return sum * multiplier;
    }
}

const myArray = [5, 2, [7, -1], 3, [6, [-13, 8], 4]];

let mySolution = new ProductSum();
let result = mySolution.solution(myArray);

console.log(result);

So for java, first you need to represent this "type" of data in some way. You have a List of either single numbers or other Lists containing the same.

You can represent this like so

sealed interface NestedNumberList {
    record SingleNumber(int x) 
            implements NestedNumberList {}
    record MultipleNumbers(List<NestedNumberList> numbers) 
            implements NestedNumberList {}
}

Which you should read as a NestedNumberList is one of SingleNumber or MultipleNumbers.

which makes your example data look like this

[5, 2, [7, -1], 3, [6, [-13, 8], 4]]
var myArray = new MultipleNumbers(List.of(
    new SingleNumber(5),
    new SingleNumber(2),
    new MultipleNumbers(List.of(
        new SingleNumber(7),
        new SingleNumber(-1)
    )),
    new SingleNumber(3),
    new MultipleNumbers(List.of(
        new SingleNumber(6),
        new MultipleNumbers(List.of(
            new SingleNumber(-13),
            new SingleNumber(8)
        )),
        new SingleNumber(4);
    ));
));

Which yes, is way more verbose, but lets not dwell on that. This is a bit of a pathological case.

class ProductSum {
    int solution(MultipleNumbers array, int multiplier) {
        int sum = 0;
        for (var num : array.numbers()) {
            switch (num) {
              case SingleNumber single -> {
                  sum = sum + single.x();
              }
              case MultipleNumbers multiple -> {
                  sum += this.solutionHelper(multiple, multiplier + 1);
              }
            }
        }
        return sum * multiplier;
    }

    int solution(MultipleNumbers array) {
        return this.solution(array, 1);
    }
}
System.out.println(new ProductSum().solution(myArray));

So thats roughly conceptually equivalent. (for Java 17 with preview features). For the default multiplier value, in this case we emulate that with method overloading.


<- Index

Smuggling Checked Exceptions with Sealed Interfaces

by: Ethan McCue

Vanilla Code

import java.sql.Connection;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;

record User(String name) {}

public class VanillaCode {
    public static Optional<User> lookupUser(Connection db, int id) throws SQLException  {
        var statement = db.prepareStatement("SELECT name FROM USER where id = ?");
        statement.setInt(1, id);
        var resultSet = statement.executeQuery();
        if (resultSet.next()) {
            return Optional.of(new User(resultSet.getString(1)));
        }
        else {
            return Optional.empty();
        }
    }

    public static List<Optional<User>> lookupMultipleUsers(Connection db, List<Integer> ids) throws SQLException {
        List<Optional<User>> users = new ArrayList<>();
        for (int id : ids) {
            users.add(lookupUser(db, id));
        }
        return users;
    }

    public static void exampleUsage(Connection db) {
        try {
            lookupUser(db, 123).ifPresentOrElse(
                    user -> System.out.println("FOUND USER: " + user),
                    () -> System.out.println("NO SUCH USER")
            );
        } catch (SQLException sqlException) {
            System.out.println("ERROR RUNNING QUERY: " + sqlException);
        }
    }
}

Sealed Types

import java.sql.Connection;
import java.sql.SQLException;
import java.util.List;

record User(String name) {}

sealed interface UserLookupResult {
    record FoundUser(User user) implements UserLookupResult {}
    record NoSuchUser() implements UserLookupResult {}
    record ErrorRunningQuery(SQLException sqlException) implements UserLookupResult {}
}

public class SealedTypes {
    public static UserLookupResult lookupUser(Connection db, int id) {
        try {
            var statement = db.prepareStatement("SELECT name FROM USER where id = ?");
            statement.setInt(1, id);
            var resultSet = statement.executeQuery();
            if (resultSet.next()) {
                return new UserLookupResult.FoundUser(new User(resultSet.getString(1)));
            }
            else {
                return new UserLookupResult.NoSuchUser();
            }
        }
        catch (SQLException e) {
            return new UserLookupResult.ErrorRunningQuery(e);
        }
    }

    public static List<UserLookupResult> lookupMultipleUsers(Connection db, List<Integer> ids) {
        return ids
                .stream()
                .map(id -> lookupUser(db, id))
                .toList();
    }

    public static void exampleUsage(Connection db) {
        switch (lookupUser(db, 123)) {
            case UserLookupResult.FoundUser foundUser ->
                System.out.println("FOUND USER: " + foundUser.user());
            case UserLookupResult.NoSuchUser __ ->
                System.out.println("NO SUCH USER");
            case UserLookupResult.ErrorRunningQuery errorRunningQuery ->
                System.out.println("ERROR RUNNING QUERY: " + errorRunningQuery.sqlException());
        }
    }
}

Sealed interfaces let you properly represent "sum types". This thing is either "A" or "B".

In this case we have a Stream<Integer> that we want to turn into a Stream<User>, but our method that takes Integer -> User throws a SQLException.

Your options before sealed interfaces were to

  1. Make it a Stream<User> by re-throwing any SQLException as a RuntimeException. As other comments have mentioned, this can be an issue depending on your type of stream. Also it requires that you fail the whole procedure if getting any one User fails
  2. Make it a Stream<Object> by returning any SQLException as a value, then cast it back at the end with instanceof checks.
  3. Make it a Stream<Try<User>> like with vavr. This works if you don't care about the exception and just care that it failed in some way, but you won't be able to call methods particular to SQLException like getSQLState without casting. Also you lose documentation of why something can fail.
  4. Do this same technique, but without a sealed interface. Doing this without needing to have default -> ... error ... branches on all your switches would require using the visitor pattern.

The new option is what you see, you can properly represent a function from Integer -> User | SQLException by wrapping each of the values in a sealed hierarchy.

sealed interface UserLookupResult {
    record FoundUser(User user) implements UserLookupResult {}
    record ErrorRunningQuery(SQLException sqlException) implements UserLookupResult {}
}

So it becomes Stream<Integer> -> Stream<UserLookupResult>, which is effectively Stream<FoundUser | ErrorRunningQuery>. This gives you the most flexibility in how to interpret the result of the Stream without any casting or assumptions in usage.

Also if you had another possibility like UserIsBanned, you can add it to the sealed hierarchy and all our switches would force you to fix them.


<- Index

Static Dependency Injection with Intersection Types

by: Ethan McCue

One of the patterns I am personally partial to that I haven't really seen get traction or attention in Java is to do DI manually.

Imagine a hypothetical web framework.

interface Handler<Context> {
    Response handle(Context context, Request request)
}

...

final class Router<Context> {
    ...

    Router(Context context) {
        ...
    }

    void addHandler(String route, Handler<Context> handler) {
        ...
    }
}

We can make a router that carries through some context to the different handlers

List<String> ips = Collections.synchronizedList(new ArrayList<>());
Router<List<String>> router = new Router<>(ips);
router.addHandler("/hello", (ips, req) -> {
    ips.add(req.ip());
    return Response.of(ips.toString());
});

And then all our route handlers will get the context. Then the trick is to make an interface for each stateful "thing" a handler might want access to, like a database connection or a redis connection.

interface HasDB {
    DB db();
}

interface HasRedis {
    Redis redis();
}

And similarly for anything that you might have that is "derivative" of those root stateful components

interface UserService {
    User findById(int id);
}

final class UserServiceImpl implements UserService {
    ...

    UserServiceImpl(DB db) {
        ...
    }

    User findById(int id) {
        ...
    }
}

interface HasUserService {
    UserService userService();
}

And make the "Context" be a type that implements all of those interfaces

record System(DB db, Redis redis) implements HasDB, HasRedis, HasUserService {
    @Override
    public UserService userService() {
        return new UserServiceImpl(db);
    }
}

Then a handler just needs to "declare its dependencies" by saying which stateful components it wants to use. For example if we have a handler that just wants to lookup a user and write things into redis

public static <Components extends HasRedis & HasUserService> handleRequest(
    Components components, Request request
) {
   var redis = components.redis();
   var userService = components.userService();

   ...
}

...

System system = new System(...);
Router<System> router = new Router<>(system);
router.addHandler("/hello", Handlers::handleRequest);

And by virtue of System being HasDB, HasRedis and a HasUserService it will fulfill HasRedis & HasUserService.

Replace "route handler" with whatever other entry points your app has and boom, dependency injection without reflection or magics.

There are downsides - System might get fairly large depending on your preferences, it doesn't solve the problem of starting everything in the right order, and there is a decent amount of boilerplate - but I just wish more people knew about this "System pattern."


<- Index

Handling numbers too big for base data types

Question from Skip#4185

How do you handle numbers that are too big to fit into the largest available data types?

Would you define a struct containing multiple large data types and somehow split up the number?

I'm not trying to do something like that, but i'm just curious if anyone knows what the right course of action would be in such a case

End of day you should use a bigint library, but you can also "build" it yourself if you are so inclined

#include <stdio.h>
#include <stdlib.h>

enum bigint_kind {
    _64_BIT,
    _MORE_THAN_64_BIT
};

union bigint_value {
    long u64;
    struct {
        long* blocks_of_64_bit;
        size_t length;
    } bigger_than_u64;
};

struct bigint {
    enum bigint_kind _kind;
    union bigint_value _value;
};

struct bigint* bigint_from_long(long value) {
    struct bigint* i = calloc(0, sizeof(struct bigint));
    i->_kind = _64_BIT;
    union bigint_value value_;
    value_.u64 = value;
    i->_value = value_;
    return i;
}

void bigint_free(struct bigint* i) {
    if (i->_kind == _MORE_THAN_64_BIT) {
        free(i->_value.bigger_than_u64.blocks_of_64_bit);
    }
    free(i)
}

If you are using rote C it can be a bit of a nightmare to track memory now for darn numbers since any of them can now contain a heap allocated array and in C you don't have "move semantics" like c++ so you end up doing unneeded heap allocations to track stuff and also there is some pointer indirection here which is annoying.

TL;DR; use a library for it

python, java, etc it should "just work". python's numerics auto promote and java has BigInteger. But the only languages on your profile are C and python. so if you are asking the question...

Exercise for the reader

struct bigint* bigint_add(struct bigint* a, struct bigint* b) {
    if (a->_kind == _64_BIT) {
        if (b->_kind == _64_BIT) {
            
        }
        else {
            
        }
    }
    else {
        
    }
}

<- Index

Basic C++ value class

by: Ethan McCue

A class in C++ is the same thing as a struct, just the class keyword makes the fields private by default instead of public by default with struct.

Here is a complete example of a simple "money" class

#pragma once

#include <cstdint>
#include <functional>
#include <ostream>
#include <optional>

namespace shopping {
    class Money final {
    private:
        constexpr explicit Money(std::uint32_t cents) noexcept: cents(cents) {};
        const std::uint32_t cents;

    public:
        constexpr static Money fromCents(std::uint32_t cents) noexcept {
            return Money(cents);
        };

        constexpr bool operator==(const Money& other) const noexcept {
            return this->cents == other.cents;
        };

        [[nodiscard]] constexpr std::uint32_t getCents() const noexcept {
            return this->cents;
        };

        friend std::ostream& operator<<(std::ostream &os, const Money &money) {
            os << "Money{cents=" << money.cents << "}";
            return os;
        };

        constexpr Money operator+(const Money& other) const noexcept {
            return Money::fromCents(this->getCents() + other.getCents());
        };

        constexpr std::optional<Money> operator-(const Money& other) const noexcept {
            if (other.getCents() > this->getCents()) {
                return std::nullopt;
            }
            else {
                return Money::fromCents(this->getCents() - other.getCents());
            }
        };
    };
}

namespace std {
    template <> struct hash<shopping::Money> {
        size_t operator()(const shopping::Money& money) const noexcept {
            return hash<uint32_t>{}(money.getCents());
        }
    };
}

This is the same as this rust

#[derive(Debug, Eq, PartialEq, Hash)]
struct Money(cents: u32);

Now maybe stop learning c++.


<- Index

How to build C++ from a git repo

Question from csstudentbruh#5797

How do I make a makefile in order to compile a .cc file that uses libraries from a git repo

Okay so there are basically 2 questions there

  1. what are the commands to run to compile the git repo and include it in your code
  2. how to use a makefile to do it

In general using a makefile is just copy pasting the commands you would have run by hand so lets focus on question 1.

https://github.com/hzeller/rpi-rgb-led-matrix

Looking at the library, it looks like they have a make file which can build it.

So in broad strokes - build that library with something like make all, then build your code which references the header files in the library then link all the code together.

c++ is a nightmare, I'm truly sorry. It has the worst "build story" of any modern language.


<- Index

Type Classes in Elm

by: Ethan McCue

Conversation with somebody#0002

elm doesn't even have typeclasses apparently

Yeah, but you can do the pattern the same way scala does, just manually

so how does show work in elm

type alias ShowOps a =
    { show : a -> String }


show : ShowOps a -> a -> String
show ops a =
    ops.show a


stringShowOps : ShowOps String
stringShowOps =
    { show = identity }


intShowOps : ShowOps Int
intShowOps =
    { show = String.fromInt }


listShowOps : ShowOps a -> ShowOps (List a)
listShowOps elementShowOps =
    { show = \l -> "[ " ++ String.join ", " (List.map (show elementShowOps) l) ++ " ]" }


x : String
x =
    show (listShowOps intShowOps) [ 1, 2, 3 ]

Like this.

It's literally the same as scala except there are no implicit parameters - at which point you start to realize that maybe its not that special a pattern.

well

obviously.

it's the implicit parameters that is the special part

sure, but the point is if i want to take a parameter that is Show, I just need access to the functions

well... yes that's kinda how vtables work too?

Yep, it's all connected. You want "dynamic dispatch" in a strongly typed system with no interface polymorphism? Make a vtable, pass it around.


<- Index

Celsius to Fahrenheit in Java

Question from Sahil#8151

i want to convert celsius to farenhiet and vice versa i have got the conversion part down but i want to be able to do with only the letter so that if someone types F at the end it converst to C and if someone types C at the end of any number it converst to farenheit without asking the user to what they want it converted to

public final class Temperature {
    private final double c;

    private Temperature(double c) {
        this.c = c;
    }

    public static Temperature fromCelcius(double c) {
        return new Temperature(c);
    }

    public static Temperature fromFarenheight(double f) {
        return (f - 32) * (5.0 / 9.0);
    }

    public double celcius() {
        return this.c;
    }


    public double farenheight() {
        return (this.c * (9.0 / 5.0)) + 32;
    }
}

Obviously you know how to do the math, but one technique is to just always store one temperature type, one internal representation, and do conversions as needed but directly, at the boundaries.


<- Index

How to wait for the value from onSuccess

Question from Stefan.lnd#2296

How do I make sure that java waits for the value from the onSucces method: https://pastebin.com/TR6ea2QZ ? (I cut out the irrelevant stuff)

public float getMoney(Player player){
      Float[] money = new Float[1];

      try {
          //SOME CODE FOR DB
          asyncMySQL.getMoneyAsnyc(statement, new Callback<Float>() {
             
              @Override
              public Float onSucces(Float money) {
                  //This value should be returned!!!
                  money[0] = money;
              }

              @Override
              public void onFailure(Throwable cause) {

              }
          });

      } catch (SQLException e) {
          e.printStackTrace();
      }
      return money[0];
  }

I want to call out that, long term, it;s going to be a better tech investment to use blocking sql because loom, but yes, you can return a Future

public class FutureExample {
    interface ExampleAsync {
        void onSuccess(float money);
        void onFailure(Throwable throwable);
    }
    
    private void someThing(ExampleAsync exampleAsync) {}
    
    public Future<Float> getMoney() {
        final var future = new CompletableFuture<Float>();
        try {
            someThing(new ExampleAsync() {
                @Override
                public void onSuccess(float money) {
                    future.complete(money);
                }

                @Override
                public void onFailure(Throwable throwable) {
                    future.completeExceptionally(throwable);
                }
            });
        } catch (Exception e) {
            future.completeExceptionally(e);
        }
        return future;
    }
}

<- Index

How to remove items using a stream

Question from Kevin#1322

When you iterate over a map.keySet(), map.values() or map.entrySet() and you call the iterator.remove() method. How does it work that the key-value of the map is removed?

It depends on the exact collection. Sometimes remove won't be implemented, so for most purposes a stream and a filter would be "cleaner" and less prone to running into potentially unsupported methods.

i see

how do you remove items while using a stream?

Like this

List<Integer> xs = List.of(1, 2, 3, 4);
List<Integer> evens = xs.stream().filter(x -> x % 2 == 0).collect(Collectors.toList());

And for a map you would use Collectors.toMap if you need a map at the end. So "filter" is the answer

I see instead of removing you just create a new collection

Yeah, because a bunch of iterators don't support remove (like ArrayList's would, but not what you get from List.of(..)) if you are programming to the List or Map interface instead of a particular implementation it's safer to do a filter.


<- Index

What does this regex mean...??

Question from sobiter#2949:

What does this regex mean...??

(?!.*\1)

Anything that isn't group 1??

It means you should go outside

hug a stranger

smell a flower


<- Index

How do I send a message to everyone

Question from GummyJon#4984:

Hey! I'm currently trying to make a custom communication client between me and my friends, and I'm coming across an issue. I have a server running which creates a different thread per person, but I want it to run a block of code along all threads, so everybody gets the same message. How can this be done?

For context, this is the code itself per thread

import java.net.*;
import java.io.*;

public class Handler extends Thread{

  private Socket socket = null;

  public Handler(Socket socket){
      super("Handler");
      this.socket = socket;
  }

  public void run() {
      try (
               PrintWriter out = new PrintWriter(socket.getOutputStream(), true);
               BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()))
       ) {
           String inputLine;
           while ((inputLine = in.readLine()) != null) {
               System.out.println(inputLine);
               out.println(inputLine);
           }
       } catch (
               IOException e) {
           System.out.println("Exception caught");
       }
   }
}

Have each thread have a method of communication. In practice this can mean each that each thread as its own queue it reads off of and the "producer" writes the same message to the queue of each "consumer".

Here is a decent rundown of your options, but ArrayBlockingQueue is probably good enoughhere is a decent rundown of your options, but ArrayBlockingQueue is probably good enough, so:

record Message(String contents) {}

import java.net.*;
import java.io.*;
import java.util.concurrent.ArrayBlockingQueue;

public class Handler extends Thread{
    private Socket socket = null;
    private final ArrayBlockingQueue<Message> queue;
    
    public Handler(ArrayBlockingQueue<Message> queue, Socket socket){
        super("Handler");
        this.queue = queue;
        this.socket = socket;
    }
    
    public void run() {
        while (true) {
            Message msg;
            try {
                msg = this.queue.take();
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    }
}

And then push things into the queue from another thread. Though please don't extend thread, just implement Runnable.

record Message(String contents) {}

import java.net.*;
import java.io.*;
import java.util.concurrent.ArrayBlockingQueue;

public class Handler implements Runnable {
    private Socket socket = null;
    private final ArrayBlockingQueue<Message> queue;
    
    public Handler(ArrayBlockingQueue<Message> queue, Socket socket){
        super("Handler");
        this.queue = queue;
        this.socket = socket;
    }
    
    @Override
    public void run() {
        while (true) {
            Message msg;
            try {
                msg = this.queue.take();
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    }
}

<- Index

How to do no more than 3 deletes a month

Question from ab_al#8947:

I have a method that will check number of times user delete their tasks. Requirement includes no more than 3 deletes in a month. Can someone recommend a library to achieve this task?

You need to record this information somewhere.

The "simple" solution is to record the deletions in a database and block future ones if they are up to 3 in the last month.

So the db can store the record of deletes and the times and then it's up to you to determine what a "month" is.


<- Index

How to read a properties file

Question from Kiryo#9472

i need to make that my app shows the specific value in xml file should i use scanner? or there is another way i need to extract

# Rate Exp Sp
RateXp = 4.
RateSp = 4.

the value of for example this^

actually not the xml but .properities

Get your .properties file as an input stream, then

final var properties = new Properties();
try (InputStream is = ...) {
    properties.load(is);
}

Easy peasy, then just

properties.getProperty("RateXp");

Full example:

public record Config(int rateXp, int rateSp) {
    public static Config loadFromClasspath() {
        final var properties = new Properties();
        try (final var inputStream = getClass().getClassLoader().getResourceAsStream("config.properties")) {
            properties.load(inputStream);
        }
        catch (IOException e) {
            throw new UncheckedIOException(e);
        }
        return new Config(
            Integer.parseInt(properties.getProperty("RateXp")),
            Integer.parseInt(properties.getProperty("RateSp"))
        );
    }
}

<- Index

How do I stop a thread

Question from Arbee#3030:

How do i stop a thread?

And how do I stop my bot then? lol

You can interrupt the thread, or cancel the task and then it is up to your code to check the Thread.isInterrupted flag.

Some built in methods which throw interrupted exceptions will do that check for you (like those on HttpClient) but without shutting down you cannot force a thread to stop.

Im kinda lost rn... how do I stop my bot without stopping the thread it is running in?

well - you can communicate to the bot's thread

And how do I do that?

Sharing a reference to an AtomicBoolean would do the trick.

So if your bot has an AtomicBoolean it can check it occasionally to see if the value is true. If it's true, it can clean itself up - and atomic boolean is safe to change from another thread.

But it might be difficult depending on how your bot framework is built


<- Index

How to have a HashMap with different types as values

Question from FellowTomato#4643:

public static HashMap<String, Integer> seats = new HashMap<>(){{
       put("total_rows", 9);
       put("total_columns", 9);
       put("available_seats", new HashMap<String, Integer>(){{
           put("row", 1);
           put("column", 1);   }});
  }};

How should I initialize HashMap to allow several datatypes as values?

I want something like this in result:

{
  "total_rows":5,
  "total_columns":6,
  "available_seats":[
     {
        "row":1,
        "column":1
     },
...];

Turns out, I can simply type Object...

You usually want to make objects for parsing json, not Map<Object,Object>

Sorry, I'm really new to Java. Can you provide an example?

Okay so for what you have there

      {
         "row":1,
         "column":1
      },

when we put this into Java we want to map it to a class like so

public record SeatPosition(int row, int column) {}

and then for the whole structure

public record Theater(int totalRows, int totalColumns, List<SeatPosition> availableSeats) {}

Ah, you mean make a special class for a second HashMap?

Only use a hash map if every key maps to the same thing and you don't know the set of keys ahead of time.

If you know the set of things "total_rows", "available_seats" - that is your clue you should be representing things as classes.

records are a good default for this kind of thing. They are just a shorthand way of making a class that contains data and has methods for accessing it.


<- Index

How do I extract the time from a Java Date object

Question from Meeks#7478:

how can i extract the time (i.e HH:mm:ss) from a java date object?

Take your date

Date d = new Date();

turn it into an Instant

Date d = new Date();
Instant instant = d.toInstant();

make a date time formatter

Date d = new Date();
Instant instant = d.toInstant();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm:ss");

interpret your instant in a time zone

Date d = new Date();
Instant instant = d.toInstant();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm:ss");
ZonedDateTime time = instant.atZone(ZoneId.of("EST"));

then format that with the formatter

Date d = new Date();
Instant instant = d.toInstant();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm:ss");
ZonedDateTime time = instant.atZone(ZoneId.of("EST"));
String dateString = formatter.format(time);

and then just don't have a Date anymore

Instant instant = Instant.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm:ss");
ZonedDateTime time = instant.atZone(ZoneId.of("EST"));
String dateString = formatter.format(time);

will the time still be considered a date object

cause my sql column is a date time type

Is it a java.sql.Date or java.sql.TimeStamp?

https://docs.oracle.com/javase/8/docs/api/java/sql/Timestamp.html#toInstant--

https://docs.oracle.com/javase/8/docs/api/java/util/Date.html#toInstant--

Both have a toInstant method which is how you can enter the realm of sane date types.

Both also have a from(Instant) (java.sql.Date extends java.util.Date).

so when im saving to db an instant will be able to be inserted in?

Depends on your driver, but most likely when you insert you will need to convert Date.from(instant).

i see

lemme try

thanks man


<- Index

Are Functions called Methods in Java

Question from Em.#0694

Hi

Are functions called methods in Java?

Yes, but the caveat is that you don't have true "free standing" functions.

so in python

def f(x, y):
    return x + y

This is a function, it is its own thing. In Java "methods" have to "belong" to something either instances of a class or to the class itself.

Oh

yeah

that makes sense

class Thing {
    static int f(int x, int y) {
        return x + y;
    }
}

so this f function in python, translated to java has to "belong" to some class as a "static method".


<- Index

What would define FP in Java?

Question from Fast Q#2816

What would define FP in java? I usually hear it thrown around when talking about streams/ functional interfaces

Let me write out some examples let me write out some examples

public final class BankAccount {
    private int balance;
    
    public BankAccount() {
        this.balance = 0;
    }
    
    public void deposit(int amt) {
        if (amt < 0) {
            throw new IllegalArgumentException("amt must be non-negative");
        }
        else {
            this.balance += amt;
        }
    }

    public int withdraw(int amt) {
        if (amt < 0) {
            throw new IllegalArgumentException("amt must be non-negative");
        }
        else {
            if (this.balance < amt) {
                int balance = this.balance;
                this.balance = 0;
                return balance;
            }
            else {
                this.balance -= amt;
                return amt;
            }
        }
    }
    
    public int getBalance() {
        return this.balance;
    }
}

Okay real basic class here i just whipped up. Take a moment to read it and show me what an example usage might be.

(this isn't FP, this is normal)

(i'm also sure there is a bug, but ignore it)

So an example like:

BankAccount myTaxHaven = new BankAccount();
myTaxHaven.deposit(500);
myTaxHaven.withdraw(250);

?

sure.

Mutable bank account - you can take money in and out.

Now here is the challenge. We have a new requirement. We want to know the state of every bank account at every time.

So somewhere in the code there is a map of user id to a map of timestamp to bank account. (does that parse?)

transaction timestamp?

yeah

yep I'm tracking

But in order to support that use case we can't change bank accounts directly like that.

Or rather, one way to support it is to make the bank account immutable.

Wait, why can't we change bank accounts directly?

lets say we had this

Map<Instant, BankAccount> bankAccountAtTime = new HashMap<>();
BankAccount myTaxHaven = new BankAccount();
bankAccountAtTime.put(Instant.now(), myTaxHaven);
myTaxHaven.deposit(400); // oh no, our history is messed up

Why is our history messed up?

so the fact that we mutate our bank account after depositing? That makes me question the original claim that this is a solution

I mean yeah we're mapping to the bank accounts, not a specific balance of the bank account in that time

My framing here is a bit messed up. I guess I'm saying if they were immutable then this kind of solution would work.

If they were immutable, deposit wouldn't even do the same thing though

Thats correct.

So how would we re-write the code such that we support all the same use cases but our contract doesn't have any mutating methods.

Keep a list of <Instant, Balance> or <Instant, Transaction>(which contains before/after information) Or a map, but some data structure

I mean just the BankAccount class.

deposit/withdraw could return a Balance or Transaction object, but then I think our paradigm doesn't even make sense

public final class BankAccount {
    private final int balance;

    public BankAccount() {
        this(0);
    }

    private BankAccount(int balance) {
        this.balance = balance;
    }

    public BankAccount deposit(int amt) {
        if (amt < 0) {
            throw new IllegalArgumentException("amt must be non-negative");
        }
        else {
            return new BankAccount(this.balance - amt);
        }
    }

    record WithdrawalResult(BankAccount account, int withdrawn) {}

    public WithdrawalResult withdraw(int amt) {
        if (amt < 0) {
            throw new IllegalArgumentException("amt must be non-negative");
        }
        else {
            if (this.balance < amt) {
                return new WithdrawalResult(new BankAccount(0), this.balance);
            }
            else {
                return new WithdrawalResult(new BankAccount(this.balance - amt), amt);
            }
        }
    }

    public int getBalance() {
        return this.balance;
    }
}

A bank account is a full history - always. There is the current state of it and past states of it, but they are all equally real "accounts".

right, that's a fine way of viewing it yeah

It's like numbers. Imagine if this worked

Integer i = 5;
i.subtractOne(); // i is now 4

the value of 5 should be independent from the identity you assign that value. If that isn't the case - for numbers - it feels super wierd

Is the main point that mutability is inherently dangerous?

Kinda. there are a lot of downsides to it - it's a lot harder to multithread is a big one.

And you can always get it back if you need to.

Yeah this makes sense I definitely believe in keeping an immutable history for anything important


<- Index

How does this cause deadlock

Question from blindspot23#4418

public class Deadlock {
    static class Friend {
        private final String name;
        public Friend(String name) {
            this.name = name;
        }
        public String getName() {
            return this.name;
        }
        public synchronized void bow(Friend bower) {
            System.out.format("%s: %s"
                + "  has bowed to me!%n", 
                this.name, bower.getName());
            bower.bowBack(this);
        }
        public synchronized void bowBack(Friend bower) {
            System.out.format("%s: %s"
                + " has bowed back to me!%n",
                this.name, bower.getName());
        }
    }

    public static void main(String[] args) {
        final Friend alphonse =
            new Friend("Alphonse");
        final Friend gaston =
            new Friend("Gaston");
        new Thread(new Runnable() {
            public void run() { alphonse.bow(gaston); }
        }).start();
        new Thread(new Runnable() {
            public void run() { gaston.bow(alphonse); }
        }).start();
    }
}

This is the example given for deadlock in java docs. I presume that inside the bow method, the last line causes the deadlock. But how? We access the bowBack method using the object and not the thread. Then why do we get a deadlock.

Those synchronized blocks basically make this

Thread 1:
  - lock gaston
  - lock alphonse
  - unlock alphonse
  - unlock gaston
Thread 2:
  - lock alphonse
  - lock gaston
  - unlock gaston
  - unlock alphonse

now we can write those out

1. lock gaston
1. lock alphonse
1. unlock alphonse
1. unlock gaston
2. lock alphonse
2. lock gaston
2. unlock gaston
2. unlock alphonse

and intermesh them like so

actually - exercise for you, how can those get ordered in a way that causes deadlock

Because bowBack method invocation inside bow method tries to get the lock of gaston, while a thread that tries to get the lock on gaston is already blocked and queued?

1. lock gaston
2. lock alphonse
1. lock alphonse
2. lock gaston
1. unlock alphonse
1. unlock gaston
2. unlock gaston
2. unlock alphonse

Basically this. I think your words are right.

Alright then, Thank you for the help.


<- Index

Array basics

Question from junk#1089

the question ask for:

user input a number, n then save into array. after that, display the numbers out according to sequence

An array is a fixed size container of elements

int[] numbers = new int[10];

so here I made an array of 10 integers. They will all start at 0, and I can set

numbers[5] = 123;

any number in the array and get

int i = numbers[2];

any number in the array.

But if I try to get a number outside of the bounds of the array (in this case 0-9)

int j = numbers[10];

it will crash. so if a user inputs a number and you want to save that number into an array you need to

  1. Know what "place" in the array you are at
  2. Know whether you can expect a certain number of elements because an array, like I said, is fixed size

thank you so much


<- Index

Why is my List code crashing

Question from Los Feliz#2763

is there a way to convert List<String> to ArrayList<Object>, and vice versa?

here's the full picture:

I have a String arr[].

I convert it into List<String> so I can show its content on Android app.

But then for performing filter for search purposes, I require an ArrayList of objects.

So I wanna convert my List<String> into an ArrayList<Object>, because my adapter is set to accept ArrayList as its params.

this is how i converted.

String arr[];
List&lt;String&gt; list1;

list1 = Arrays.asList(getResources().getStringArray(R.array.string-array's name));

thought i know what i'm doing until i saw the app crash with what i have now.

Unknown bits set in runtime_flags: 0x8000
E/libc: Access denied finding property &quot;ro.serialno&quot;
E/AndroidRuntime: FATAL EXCEPTION: main

at java.util.AbstractList.remove(AbstractList.java:161)
at java.util.AbstractList$Itr.remove(AbstractList.java:374)
at java.util.AbstractList.removeRange(AbstractList.java:571)
at java.util.AbstractList.clear(AbstractList.java:234)
at com.example.cspeaks.MyAdapter$2.publishResults(MyAdapter.java:105)
at android.widget.Filter$ResultsHandler.handleMessage(Filter.java:284)
at android.os.Handler.dispatchMessage(Handler.java:107)
at android.os.Looper.loop(Looper.java:214)
at android.app.ActivityThread.main(ActivityThread.java:7356)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:492)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:930)
```.

ohhh. wrap your Arrays.asList(...) with new ArrayList<>(Arrays.asList(...)) that might help. IDK though that message is horrific.

OMGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG @emccue YOU ARE THE MAN! JESUS CHRIST! THANK YOUUUUUUUUUUUUUUUUUUUUU

Okay don't leave yet. I want you to understand why that helped.

List is an interface that has methods that read from a list (like .get) and methods that alter a list (like .remove).

An implementation of that interface like ArrayList supports all the methods. You can get elements, add elements, and remove elements.

But there are a bunch of methods in the JDK that return lists that are unmodifiable. Arrays.asList is one of those, because the purpose of it is to be a list "view" onto an array.

ooooooo. So in other words, when we have Arryas.asList, if we want to make changes to the array, we must change it to arraylist

That doesn't parse for me, Arrays are their own things.

So if you have a String[], that is a fixed size collection. It does not implement the List interface.

Arrays.asList(..) is a way you can take aString[] and get a List<String>, but the List you get doesn't support adding or removing elements.

If you want to add or remove elements from that list you need to copy its contents into a list that supports that.

OMG. i learned this hard way. Almost 20 hours.

ArrayList is the most common one and a good default. List.of(...) and .toList() on a Stream also give you unmodifiable lists - though those ones don't support setting elements in place either.

Its annoying, but you just need to be aware of it. If you are in doubt, copy it into an ArrayList and you know you can do everything.

yeah. lesson learned. from now on Im coppying every instance of List<> into an arraylist

Thats one approach.

In other contexts that can be called "defensive copying" - Its a generally good technique if you are working with mutable structures though overkill if you are the one making and working with all the instances (potentially).


<- Index

Why do we throw exceptions?

Question from blindspot23#4418

Why do we throw an exception when try and catch can handle it?

The use case is for "exceptional conditions". Like if you write a library that asks google for search results that can always fail because it goes over the network and you use try catch to handle it, but sometimes either

  1. you can't handle it and you need to crash
  2. you don't want to handle it and prefer to crash
  3. It makes sense to "bubble" the exception up multiple layers

Making it more complicated - there are actually 2 "kinds" of exceptions in Java, checked and unchecked ones.

Checked exceptions like IOException you always have to handle either by doing some behavior or rethrowing.

Unchecked exceptions like RuntimeException the language doesn't make you handle so generally those are used for situations where you don't think people will want to try to recover.

It's all also really abstract until you try writing some code that uses or works with them. (and also really frustrating because not even the standard library does exceptions "right" always)


<- Index

How do I break correctly

Question from Quadzilla#9639

public class Pyramid {

    public static void main(String[] args) {
        
        int s = 50;
        while (true) {
            s += 50;
            if (s > 50) {
                System.out.println(s);
            } else {
                s = 200;
                break;
            }
            while (s >= 200) {
                break;
            }
        }
    }
}

How can I add in a break to prevent the numbers from going to 20,000 in the terminal and beyond. Am I applying break correctly? I adapted the code from a java forum that printed a pyramid & thought today is the day to apply the break statement. s was changed by me to integer rather than string.

So first

        int s = 50;
        while (true) {
            s += 50;
            if (s > 50) {
                System.out.println(s);
            } else {
                s = 200;
                break;
            }
            while (s >= 200) {
                break;
            }
        }

This else block here where you set s to 200 will never run.

s starts out at 50, you immediately add 50 to it, So it's always above 50 and you never set it to anything lower.

        int s = 50;
        while (true) {
            s += 50;
            if (s > 50) {
                System.out.println(s);
            } 

            while (s >= 200) {
                break;
            }
        }

Now for that last while block with the break.

break is a "terminal" action. You will exit your loop if you do it. And because you nest that other loop you only break out of the while (s >= 200) loop, not the while (true) loop.

So what you want isn't a while, it's an if.

        int s = 50;
        while (true) {
            s += 50;
            if (s > 50) {
                System.out.println(s);
            } 

            if (s >= 200) {
                break;
            }
        }

<- Index

How do I inject dependencies into Enums

Question from Chem#9771

In javax, if you use @Inject into a constructor, but you have instances where you can't pass those parameters, what do you do?

@ApplicationScoped
public class JavascriptLanguageExecutor extends LanguageExecutor {
   @Inject
   JavascriptLanguageExecutor(EngineInstance engine) {
       super(Language.JAVASCRIPT, engine.instance);
   }
}

public enum Language {
   JAVASCRIPT("js");

   public LanguageExecutor getExecutor() {
       return switch (this) {
           // THIS NEEDS "EngineInstance" BUT I DON'T HAVE ABILITY TO "@Inject" INTO ENUMS
           case JAVASCRIPT -> new JavascriptLanguageExecutor();
       };
   }

So here this EngineInstance is available via injection in the app but you can't use this inside of enums I think, and so I can't actually instantiate the class there.

Could I move it out of the constructor and do a field injection

In that case, you need to pass an EngineInstance.

Sounds dumb, but yeah. Ignore all the javax requirements for a second, since they aren't super relevant.

public class JavascriptLanguageExecutor extends LanguageExecutor {
    JavascriptLanguageExecutor(EngineInstance engine) {
        super(Language.JAVASCRIPT, engine.instance);
    }
}

What you want is, effectively, a map of Language -> Executor. So just make one.

var executors = new EnumMap<Language, LanguageExecutor>(...);

And then pass it around where you need it.

There are more options but generally speaking Java is structured in a way where passing the thing is the path of least resistance. Unless you made your LanguageExecutors work with the service provider stuff, but that's somewhat niche (for application code).

Im not gonna pretend you can't also have a static map and fill it up with instances - you can. Just then you need to be careful of when it is hydrated vs. uninitialized and make sure to synchronize if you need to etc.


<- Index

How to execute code from a client side app

Is it possible to write complex code (like instantiating an object and use the stream API) in string then parse it using something like spring language expression? Basically what I'm trying to do is to write code from the client side in string format then find a way to parse it and execute it on the backend side. From a business use case what I want to do is write some code to out put something and simply store it in a column field named "result".

Someone replied to me before and said it's doable but I had to abandon the chat

If you abandon Java you can do this with something you can sandbox in the jvm, like possibly nashorn or my pref. - SCI.

String expressionStr = """
(let [customer-1 (Customer. "Mike" 9)
      customer-2 (Customer. "Ted" 20)
      customers. [customer-1 customer-2]]
  (->> customers
       (filter (fn [customer] (= "Mike" (.getName customer))))
       (first)))
"""

Then evaluate that in SCI.

https://github.com/borkdude/sci

IFn eval = Clojure.read("sci.core", "eval-string");
Customer customer = (Customer) eval.invoke(expressionStr);

and to give it sandboxed access to your classes, probably do this

IFn symbol = Clojure.read("clojure.core", "symbol");
IFn keyword = Clojure.read("clojure.core", "keyword");
IFn eval = Clojure.read("sci.core", "eval-string");
Customer customer = (Customer) eval.invoke(
    expressionStr, 
    Map.of(keyword.invoke("classes"), Map.of(symbol.invoke("Customer"), Customer.class))
);

Full example if you have clojure and sci included in your project

import clojure.java.api.Clojure;
import clojure.lang.IFn;
import java.util.Map;

public final class SciTest {
    private SciTest() {}

    private static final IFn REQUIRE;
    private static final IFn SYMBOL;
    private static final IFn KEYWORD;

    private static final IFn EVAL_STRING;

    static {
        REQUIRE = Clojure.var("clojure.core", "require");
        SYMBOL = Clojure.var("clojure.core", "symbol");
        KEYWORD = Clojure.var("clojure.core", "keyword");

        REQUIRE.invoke(Clojure.read("sci.core"));
        EVAL_STRING = Clojure.var("sci.core", "eval-string");
    }

    public record Thing(int x) {}

    public static void main(String[] args) {
        System.out.println(EVAL_STRING.invoke(
                """
                (Thing. 10)
                """,
                Map.of(
                        KEYWORD.invoke("classes"),
                        Map.of(
                            SYMBOL.invoke("Thing"), Thing.class
                        )
                )
        ));
    }
}

Will take a look, thanks!

Curious what the actual requirement here is though. Since this isn't exactly super duper safe to do even with the sandboxing. They can still OOM your machine or whatever.

Requirement isn't fully clear but basically when creating a new entity record, there's a "result" field where the end user wants to write some code or formula based on other fields of the entity, ex: result = id + title + getUniqueRandNubr()

Yeah this might be a better use for a little dsl. You can make a tiny lang with antlr pretty easily. This is more general purpose, you can basically write any code.


<- Index

Do you write incremental operators in loops and conditions?

Question from keldranase#4427

The question is more about coding style for production grade code. Do you write incremental operators in loops and conditions? Consider two pieces of merge algo. What version is better?

            if (list.get(left) < list.get(right)) {
                result.set(writePtr, list.get(left));
                ++left;        // separated incremetns
                ++writePtr;   
            } else {
                result.set(writePtr, list.get(right));
                ++right;
                ++writePtr;
            }
           if (list.get(left) < list.get(right)) {
               result.set(writePtr++, list.get(left++));    // increments inside things
           } else {
               result.set(writePtr++, list.get(right++));
           }

Another example is something like this:

while (someCounter-- > 0) {
  // do something
}

Or more verbose, like this:

while (someCounter > 0) {
  // do something
  --someCounter;
}

Most increments are effectively just iterators. So if you are doing anything other than (int i = 0; i < container.size(); i++) then the best code quality would come from working with the iterator/stream abstractions.

However the first one is the one I would go with, with the caveat that I would use left++ instead of ++left.

No one remembers operator precedence order except C programmers, so separating mutation from assignment/passing of values is best. Everyone does thing++ usually and, outside of a context where you are passing at the same time you are changing it, it doesn't matter.


<- Index

How do I fix my MS-Paint clone

Question from CONNOR#1257

Hello, I am trying to make an application similar to ms paint.

When I add super.paint(g) to my paint method my panel appears, however, when I start drawing the lines keep disappearing. when I remove the super.paint(g) the panel is no longer visible but the drawing works perfectly fine.

Your issue is this:

If you call super.paint(g) it will clear the screen. It then expects you to draw every point.

If you don't call super.paint(g) then the panel isn't redrawn, but whatever paints you have from previous render cycles are left.

The solution is to remember what has been drawn so far and just draw it on the screen again every time you re-render.

The easiest way to do this might be to have a 2d array of java.awt.Color.

When you draw update that array, and when you re-render just write every color to the screen.


<- Index

How does this stack implementation work

Question from whoopdoop#7311

Im looking at this stack implementation and

public class ArrayStackOfStrings implements Iterable<String> {
    private String[] items;  // holds the items
    private int n;           // number of items in stack

    public ArrayStackOfStrings(int capacity) {
        items = new String[capacity];
    }

    public void push(String item) {
        items[n++] = item;
    }

    public String pop() {
        return items[--n];
    }

If I push an item using this push method, we are changing the items list, but how are we changing the variable n to update the new number of items? same with pop()

n++ does the mutation

wait so n is being changed impliciftly?

yes.

n++ means return the current value of n and also increment it.

--n means decrement it and then return the new value.

It's clearer to write it like this

public class ArrayStackOfStrings implements Iterable<String> {
    private String[] items;  // holds the items
    private int n;           // number of items in stack

    public ArrayStackOfStrings(int capacity) {
        items = new String[capacity];
    }

    public void push(String item) {
        items[n] = item;
        n++;
    }

    public String pop() {
        n--;
        return items[n];
    }

or

public class ArrayStackOfStrings implements Iterable<String> {
    private String[] items;  // holds the items
    private int n;           // number of items in stack

    public ArrayStackOfStrings(int capacity) {
        this.items = new String[capacity];
    }

    public void push(String item) {
        this.items[this.n] = item;
        this.n++;
    }

    public String pop() {
        this.n--;
        return this.items[this.n];
    }

so you see more clearly the order of things without remembering the difference between n++ and ++n

ohhh that makes so much sense

theres no diff between n-- and --n?

There is

int x = 5;
int y = x++;

// x is 6, y is 5
int x = 5;
int y = ++x;

// x is 6, y is 6

But just don't get clever with it, do it on its own line


<- Index

What is the difference between arguments and parameters

Question from Aviral#7054

Hey guys. Can anyone please take 5 mins of their time and just explain the difference between arguments and parameters?

Some people say it's the same and I'm not really convinced...

It's like magma and lava.

Same thing, different name based on context. Close enough that only total nerds will correct you if you use them interchangeably


<- Index

Are Hashtables not used anymore in Java?

Question from asianmalaysian vietnamese#1514

Are hashtables like not used anymore in java? Is its synchronicity for key value pairs not a beneficial thing?

Oh if someone asks where did I hear hashtables not being used anymore, in a different group in a different social media platform

They have been superseded.

People should now (generally) be writing code that targets the Map interface. If you want a synchronized map you can use Collections.synchronizedMap and if you want a good concurrent map there is ConcurrentHashMap.

That being said Hashtable isn't broken or anything. You can still use it if you want a synchronized map. It just comes from before the Java collections framework so the "default" we want to show people is HashMap and then how to synchronize whatever map they chose. So strictly speaking Hashtable is redundant.

got it, thanks!

same is true for Vector vs ArrayList also like,

Hashtable implements Map and extends Dictionary

Dictionary is this wierd abstract class that is basically an interface and some of its methods like keys() are redundant with keySet() on Map and also return Enumeration.

which is like iterator but without syntax support.

So yeah, just generally more crufty.


<- Index

Do I need new exceptions for every invalid field

Question from huh#0893

are exception classes used per class field or can one exception class handle every field?

If my class has fields for

String name, int ID, double salary, double hours

If I need exceptions for handling empty name "", negative number ID, negative salary, negative hours, do I need a new exception class for every field?

by exception class I mean something like this

public class InvalidPayRate extends Exception
{
    public InvalidPayRate(double p)
    {
        super("Hourly pay rate may not be negative or greater than 25: " + p);
    }
}

The reason you don't want a specific InvalidPayRate exception in this case is that it is an unrecoverable scenario.

An Invalid pay rate will almost always mean programmer error.

public class InvalidPayRate extends Exception
{
    public InvalidPayRate(double p)
    {
        super("Hourly pay rate may nor be negative or greater than 25: " + p);
    }
}

if you were to throw this exception from somewhere you would need to declare that you throw it and then the caller would need to handle it.

So at the very least you want this to extend RuntimeException.

public class InvalidPayRate extends RuntimeException
{
    public InvalidPayRate(double p)
    {
        super("Hourly pay rate may nor be negative or greater than 25: " + p);
    }
}

Since it's a programmer error you don't expect to catch.

At that point you need to weigh the value of doing this for each field in a class and the value is just having the stack trace say "InvalidPayRate", which you can already put into the message.


<- Index

Enums are a shorthand

by: Ethan McCue
public enum StopLight {
    RED, GREEN, YELLOW;
}

Enums are a shorthand for this pattern

public final class StopLight {
    public static final StopLight RED = new StopLight();
    public static final StopLight GREEN = new StopLight();
    public static final StopLight YELLOW = new StopLight();

    private StopLight() {}
}

They get more support by being a language feature just like records but this is it.

So when you say make a singleton like this

public class PlayerManager {
    private static final PlayerManager INSTANCE = new PlayerManager();
    public static PlayerManager getInstance() {
        return INSTANCE;
    }
  
    private PlayerManager() {}
}

all I see is this

public enum PlayerManager {
    INSTANCE;

    public static PlayerManager getInstance() {
        return INSTANCE;
    }
}

<- Index

How do I make Yoté in JavaFX

Question from Bruno Machado#9013

Hi everyone, I want to develop the Yoté game in java fx with scenebuilder and sockets, but I'm having trouble figuring out how to make each client's interface change every moment. How can I do this kind of communication with the server in javafx? thank you all

Have a dedicated thread poll the socket for the current state on an interval and communicate that to your UI thread.

I don't know the JavaFX specifics but the general concept is to constantly poll on a thread (or have the server push) and then update your model of the game accordingly.


<- Index

Why is my python so ugly

Question from Leslie#7406

FRR = map(lambda element: element[1], read_csv)
FAR = map(lambda element: element[2], read_csv)

why is this so ugly

Use operator.itemgetter(1), thats what it is for. Also a list comprehension is nicer usually.

make an example out of my code pls

FRR = map(operator.itemgetter(1), read_csv)

But then also

FRR = [ element[1] for element in read_csv ]

But then also

FRR, FAR = zip(*[ (element[1], element[2]) for element in read_csv ])

idk Ive never seen a zip(* [ before

* is a splat. It unrolls an iterable into arguments.


<- Index

How to handle passwords

by: Ethan McCue

DO NOT

Store passwords in plaintext. You should never have a record of what your user's passwords are in a way which you can read them.

DO NOT

Encrypt passwords

https://nakedsecurity.sophos.com/2013/11/04/anatomy-of-a-password-disaster-adobes-giant-sized-cryptographic-blunder/

DO NOT

Use a general purpose hash function like SHA or MD5

https://security.stackexchange.com/questions/90064/how-secure-are-sha256-salt-hashes-for-password-storage

DO NOT

"roll your own" implementation of cryptographic functions. You will get it wrong.

https://security.stackexchange.com/questions/18197/why-shouldnt-we-roll-our-own

DO

Use a well tested implementation of PBKDF2, bcrypt, or scrypt to hash and salt passwords

https://spring.io/projects/spring-security https://mvnrepository.com/artifact/org.springframework.security/spring-security-core

import org.springframework.security.crypto.password.PasswordEncoder
import org.springframework.security.crypto.factory.PasswordEncoderFactories;

public final class PasswordUtils {
    private PasswordUtils() {}

    private static final PasswordEncoder PASSWORD_ENCODER = 
        // At time of writing the default implementation uses BCrypt
        PasswordEncoderFactories.createDelegatingPasswordEncoder();

    // Store the result of this in your database
    public static String encode(String password) {
        return PASSWORD_ENCODER.encode(password);
    }

    // And use this to check if a user gave you the right password later
    public static boolean matches(String password, String encodedPassword) {
        return PASSWORD_ENCODER.matches(password, encodedPassword);
    }
}

<- Index

How do I get this part of a String

Question from RaiderRoss#6666

hey

ik this might sound stupid but i need help with some string manipulation

int pos = e.getMessage().getContentRaw().lastIndexOf(" ");

So this is my code, my String would look like this

~ban @RaiderRoss This is the whole reason including spaces
                 -----------------------------------------

how would i get the underlined part as a string

so like the second index basically

Once you have an index, you can trim the string from that point forward using substring.

so

String messageRaw = e.getMessage().getContentRaw();
String reason = messageRaw.substring(messageRaw.lastIndexOf(" "));

But in your case it seems like you would only get spaces. Instead, we can split on spaces.

String messageRaw = e.getMessage().getContentRaw();
String[] words = messageRaw.split(' ');

then join, skipping the first two

how do I join

skipping the first two

String messageRaw = "a b c ";
String[] words = messageRaw.split(" ");
String reason = Arrays.stream(words)
        .skip(2)
        .collect(Collectors.joining(" "));

btw our teacher never thought us any of this I got to learn it all on my own so ty for the help

This is one way. You can also do it manually or instead just find the index of the 2nd space and take a substring after that.

https://stackoverflow.com/questions/19035893/finding-second-occurrence-of-a-substring-in-a-string-in-java/35155037

String messageRaw = "a b c ";
String reason = messageRaw.substring(messageRaw.indexOf(" ", messageRaw.indexOf(" ") + 1));

and you can also use regex if you feel brave

ok i got it now ty sm for the help


<- Index

How to print data with proper spaces

Question from ericmp#6201

hi, im tring to print this but with proper spaces, to make it more visual:

System.out.printf("%-25s %-25s %-20s %-21s %-21s %-24s %-24s %-25s %-25s %-20s %-20s %-20s %-25s %n", "{ nom: " + nom , "tipus: " + tipusS , "número: " + numeroS , "valor: " + valorS , "pedres: " + pedresS , "tirada: " + tiradaS , "agafada: " + agafadaS , "triomf: " + triomfS , "última: " + ultimaS , "tirada per: " + tiradaPerS , "recollida per: " + recollidaPerS , "tirada primer: " + tiradaPrimerS , " }");

but as u see, is not really good, like, sometimes there is more spaces, and sometimes less, and finally, there isnt

how could i do it better?

(i tried to put 20s to each one, but looks worse)

So this is actually something that really old programs needed to do a lot. And honestly most bank statements, etc.

But it's weirdly not super supported in modern languages. If you don't care about how long it gets we can make a method that prints to a table so.

public static void outputAsTable(List<Map<String, String>> stuff) {

}

Start with this - assume we have a list of maps you want to print out. We can scan through each map and find the longest value for each key.

public static void outputAsTable(List<Map<String, String>> stuff) {
    Map<String, Integer> longestValues = new HashMap<>();
    for (var map : stuff) {
        for (var entry : map) {
            if (longestValues.getOrDefault(entry.getKey(), 0) < entry.getValue().length()) {
                longestValues.put(entry.getKey(), entry.getValue().length()
            }
        }
    }
    ...
}

Then once you have all those you can do math to figure out how many spaces to add.

I think this will be hard, I've never worked with this maps, and this variables are not stored inside any array, they are just the attributes of a class, so I print them to check how all is working

You just need to convert your objects to a list of maps, unfortunately. If you want it all properly spaced you need to print them all at once not individually since if you think about it, how much padding to give any one element can be determined by what you want to print 10 items later.

Another option is to print vertically

field_1:   ...
field_abc: ...
ww:        ...
-------------------
field_1:   ...
field_abc: ...
ww:        ...

<- Index

Is it possible to use SCSS in a Java project

Question from Senhor#3353

hi

i have one question: is it possible to use SCSS in a Java project? I am currently generating PDF from HTML pages; i craft these pages with html and pure css, and i was wondering if there's a way to use SCSS instead

Sure, but you are going to need to do some plumbing.

SCSS is built in JS so there really isn't a way to include it as part of a java build system.

You will need to install it with npm separately either globally, or in a project in the directory where you need to run your jar.

Then you need to invoke scss to generate CSS files.

You can do this through ProcessBuilder, so it is definitely doable - just not frictionless.


<- Index

Why is the else not printing

Question from akira💖🌻#6298

hello i have a small small problem

package HelloJaxa;
import java.util.*;
public class Main {
    public static void main(String[] args) {
        Scanner scan=new Scanner(System.in);
        System.out.print("Enter 10 Grades--> ");
        int Number=0;
        int counter90=0;
        int counterless=0;
        if (Number>=0 && Number<=100) {
            for (int i = 0; i < 10; i++) {
                Number = scan.nextInt();
                if (Number >= 90)
                    counter90++;
                if (Number > 60 && Number < 90)
                    counterless++;
            }
            System.out.println("Grades that equal 90 or more-->"+counter90);
            System.out.println("Grades that are between 60 and 90--> "+ counterless);
        }
        else
            System.out.println("Error!");
    }}

why is the else not printing when i input a number thats <0 or >100

First, put the {} around the else. Sometimes that can be the whole issue. Try not to omit the {} even though its technically allowed.

And you will never reach the else since you check if the number is between 0 and 100 before entering the loop where you ask for input.

You never return to that check again so it only runs once and sees 0.

what do i do now ?

Move your check inside your for loop


<- Index

Simple AtomicReference Wrapper

by: Ethan McCue

Simple wrapper around an atomic reference to provide a basic functional interface for CAS operations.

Useful if you have some immutable state you want to share between and mutate on multiple threads

package dev.mccue.async;

import java.util.concurrent.atomic.AtomicReference;
import java.util.function.Function;

/**
 * Simple wrapper over an AtomicReference to provide an API for doing compare and swap operations.
 *
 * Modeled after the atom primitive in clojure.
 * @param <T> The type of data stored in the atom. This is assumed to be an immutable object.
 */
public final class Atom<T> {
    private final AtomicReference<T> ref;

    private Atom(T data) {
        this.ref = new AtomicReference<>(data);
    }

    /**
     * Creates an atom wrapping the given data.
     * @param data The data to be stored in the atom.
     * @param <T> The type of data stored in the atom. This is assumed to be an immutable object.
     * @return An atom containing the given data.
     */
    public static <T> Atom<T> of(T data) {
        return new Atom<>(data);
    }

    /**
     * Swaps the current value in the atom for the value returned by the function.
     * @param f The function to apply to the current value. It is expected that this
     *          will be a "pure" function and thus may be run multiple times.
     * @return The value in the Atom after the function is applied.
     */
    public T swap(Function<? super T, ? extends T> f) {
        while (true) {
            final var start = ref.get();
            final var res = f.apply(start);
            if (this.ref.compareAndSet(start, res)) {
                return res;
            }
        }
    }

    /**
     * Pair of the new value swapped into an atom and some value that was
     * derived in the course of calculating that new value.
     * @param <T> Type of the new value.
     * @param <R> Type of the derived value.
     */
    public record SwapResult<T, R>(T newValue, R derivedValue) {}

    /**
     * Performs a swap that carries over some context from the computation to the caller.
     *
     * For example, a basic usage would be to return some whether a value was inserted into a map.
     *
     * <pre>{@code
     * sealed interface PlayerJoinResult permits AlreadyInGame, Success {}
     * record AlreadyInGame() implements PlayerJoinResult{}
     * record Success(String playerId) implements PlayerJoinResult {}
     * // ...
     * final var playerId = UUID.randomUUID().toString();
     * final var gameAtom = Atom.of(Map.empty());
     * final var swapResult = gameAtom.complexSwap(game -> {
     *    if (game.contains(playerId)) {
     *        return new ComplexSwapResult<>(game, new AlreadyInGame());
     *    }
     *    else {
     *        return new ComplexSwapResult<>(game.put(playerId, new Object()), new Success(playerId));
     *    }
     * });
     *
     * if (swapResult.derivedValue() instanceof AlreadyInGame) {
     *     return "Oh no!";
     * }
     * else {
     *     return "hooray";
     * }
     * }</pre>
     *
     * @param f The function to apply to the current value. It is expected that this
     *          will be a "pure" function and thus may be run multiple times.
     * @param <R> The type of the context attached to the final result.
     * @return A pair of the new value put into the atom and the derived value from the
     * computation of that new value.
     */
    public <R> SwapResult<T, R> complexSwap(
            Function<? super T, SwapResult<? extends  T, ? extends R>> f
    ) {
        while (true) {
            final var start = ref.get();
            final var res = f.apply(start);
            if (this.ref.compareAndSet(start, res.newValue())) {
                return new SwapResult<>(res.newValue(), res.derivedValue());
            }
        }
    }

    /**
     * Resets the value in the atom to the given value.
     * @param data The new value to be stored in the atom.
     * @return The new value stored in the atom.
     */
    public T reset(T data) {
        this.ref.set(data);
        return data;
    }

    /**
     * @return The atom's current value.
     */
    public T get() {
        return this.ref.get();
    }

    @Override
    public String toString() {
        return "Atom[value=" + this.get() + "]";
    }
}

<- Index

How is abstraction achieved through abstract classes and interfaces

Question from Data20#5839

How is abstraction achieved through abstract classes and interfaces?

I may have a misunderstanding somewhere, so I'll elaborate on where I'm at with the concept.

From my understanding: Abstraction is a concept of OOP that focuses on hiding implementation detail and showing only what is necessary. The functional detail

This is the example I see when I go with this definition:

public class Base {

    public int area(int length, int width){
        return length * width;
    }
}

public class Main {
   public static void main(String[] args) {
        Base r1 = new Base();

        System.out.println("The area of this rectangle is " + r1.area(5,6) + ".");
    }
}

When I called the area method in Base, I just wanted the area of a rectangle. I didn't need to know the actual formula for it, just the result of it.

Having said that, I feel that I'm off somewhere and am confused with how abstraction is achieved with Abstract classes and Interfaces.

Okay so first - abstraction is not a concept of OOP.

Your area method does abstract the actual mechanism of computation. i.e. you could rewrite it like this

public int area(int length, int width){
    if (width < 0) {
        return area(length, -1 * width) * -1;
    }
    else if (width == 0) {
        return 0;
    }
    else {
        return length + area(length, width - 1);
    }
}

and its contract to the rest of the code would remain the same.

What interfaces give you is a mechanism for polymorphism. For instance if I wrote code like this

public enum IntOrder {
    LessThan, EqualTo, GreaterThan;
}

public final class IntComparator {
    public IntOrder compare(int a, int b) {
        if (a < b) {
            return LessThan;
        }
        else if (a == b) {
            return EqualTo;
        }
        else {
            return GreaterThan
        }
    }
}

I've successfully abstracted the process of comparing two integers and if I wanted to write a sort function

public static int[] sort(int[] xs) {
    // somewhere i can call new IntComparator.compare(...); and use that for ordering
}

I could use that abstracted comparison behavior.

But if I wanted the sort function to "be abstracted" over how it compares these integers I could use an interface

public interface IntComparator {
    IntOrder compare(int a, int b);
}

...

public final class NormalIntComparator implements IntComparator {
    public IntOrder compare(int a, int b) {
        if (a < b) {
            return LessThan;
        }
        else if (a == b) {
            return EqualTo;
        }
        else {
            return GreaterThan
        }
    }
}

...

public static int[] sort(int[] xs, IntComparator comparator) {
    // somewhere i can call comparator.compare(...); and use that for ordering
}

And this would let me write many different implementations of comparing ints. For instance, maybe comparing them in reverse order and treat them "the same" in some other abstraction.

So polymorphism has utility in building abstractions and interfaces give you polymorphism.

Abstract classes give you polymorphism and code sharing, but their primary utility is code sharing.

You can still make abstractions and get polymorphism in non OOP languages your mechanisms for doing so will just be different.

I'm taking in what all you said. From the readings I've done, I've always thought that Abstraction was apart of OOP.

In your defense, a large part of why OOP and specifically Java is so omnipresent is heavy degrees of marketing and hype. A lot of what you read probably never stops to point out nuance since its all somewhat derived from marketing.


<- Index

How to connect opencv in Python to Java Swing

Question from ☮𝙬𝙖𝙟𝙚𝙚𝙝𝙖☮#5024

hey people. so, i did my backend of the project in opencv-python and developed the front end using swing java. now i want to connect them. can somebody guide me how to do this?

This is, conceptually, what an API is for.

Not necessarily an "api served over http" but just the general thrust of "these are two independent processes that will communicate via some mechanism".

More details about how you want these things to communicate would be helpful - are they going to be run on the same machine, etc.

Yes I want them to run on the same machine

I searched a little bit and found some method

Using execute shell command

But I'm not receiving the output I want

Maybe I'm doing something wrong idk

From the Java side you can launch the python app using ProcessBuilder and then. communicate that way.

https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/lang/ProcessBuilder.html

Ok I will try this thanks for the link.

That's just step 1, but yeah start there.


<- Index

How to make infinite character combinations

Question from Bulver#9256

Hey, please forgive my bad english. I am trying to get an infinite amount of char combinations which are stored in an array. I cant figure out how i can implement this in one method. Does anyone have ideas?

public static void one(char[] arr){
       for(int i = 0; i<95; i++){
           String a = ""+arr[i];
           if(hash2(a)%m==0){
               System.out.println(a);
           }
       }
   }

   public static void two(char[] arr){
       for(int i = 0; i<95; i++){
           for(int j = 0;j<95;j++){
               String a = ""+arr[i]+arr[j];
               if(hash2(a)%m==0){
                   System.out.println(a);
               }
           }
       }
   }

any ideas on how to infinitely stack for loops?

I don't know how many digits i need

I'd like to have one method which runs till i stop it and checks every combination

Do you understand my problem?

public static void three(char[] arr){
        for(int i = 0; i<95; i++){
            for(int j = 0;j<95;j++){
                for(int k = 0; k<95;k++){
                    String a = ""+arr[i]+arr[j]+arr[k];
                    if(hash2(a)%m==0){
                       System.out.println(a);
                    }
                }
            }
        }
    }

There must be a better way to check for more digits

One way - Instead of having all the digits be "on the stack"

String a = ""+arr[i]+arr[j]+arr[k];

store each loop's state in an object

public final class CharIterator implements Iterator<String> {
    private int i;

    public boolean hasNext() {
        return i < 95;
    }

    public String next() {
        char c = (char) i;
        i++;
        return Character.toString(c);
    }
}

and then all you need is a way to

  1. make an Iterable from that (should be fairly simple)
  2. chain two iterables into one larger iterable

basically make this

public final class IterableChain<T> implements Iterable<T> {
    public <A, B> IterableChain(Iterable<A> i1, Iterable<B> i2, Function<A, B, T> combine) { }

    // ...
}

and then all you'll need to do is

Iterable<String> iterable = new CharIterable();
for (int i = 0; i < 10; i++) {
    iterable = new IterableChain(new CharIterable(), iterable);
}

for (String s : iterable) {
    ...
}

does that make sense?

Yes, thanks alot!!


<- Index

What are Servlets

Question from Gergő#5263

What does Servlet mean?

And DispatchServlet

It's kinda historical.

It's a term for a small, pluggable bit of what would be a larger server.

The only aspect you should need to care about is that it is the interface by which you can attach to Jetty and handle http requests.

thanks!, and what is jetty? 🧐

It's a Java web server.

You would have Jetty handle connecting to whatever ports and parsing http requests and your code in the "servlet" would decide how to respond to those requests.


<- Index

Basic Cleaner Example

by: Ethan McCue
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.lang.ref.Cleaner;

public final class HasResource implements AutoCloseable {
    // This has a Thread backing it so you want to share your cleaner across the whole lib
    private static final Cleaner CLEANER = Cleaner.create();

    private final FileOutputStream outputStream;
    private final Cleaner.Cleanable cleanable;
    
    // You don't want your cleaner task to be a lambda so you don't accidentally capture a
    // ref. to the wrapping object.
    private static final class CleanerTask implements Runnable {
        private final FileOutputStream outputStream;

        private CleanerTask(FileOutputStream outputStream) {
            this.outputStream = outputStream;
        }

        @Override
        public void run() {
            try {
                this.outputStream.close();
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }
    }

    public HasResource() {
        try {
            this.outputStream = new FileOutputStream("ex.csv");
        } catch (FileNotFoundException e) {
            throw new RuntimeException(e);
        }
        this.cleanable = CLEANER.register(this, new CleanerTask(this.outputStream));
    }

    @Override
    public void close() {
        // This has "at most once" semantics so if they close it the cleaner won't run your cleanup logic a second time
        this.cleanable.clean();
    }
}

<- Index

The Problem with Annotation Processors

by: Matthias Ngeo

For reasons unknown, broaching the subject of annotation processors seem to elicit some primordial fear in developers. People tend to associate annotation processing with borderline witchcraft and sorcery perform-able only by most adept of basement wizards. It doesn’t have to be that way. Annotation processing doesn’t have to be the big scary monster hiding under the bed.

Image taken from https://sourcesofinsight.com/monsters-under-the-bed/
Image taken from https://sourcesofinsight.com/monsters-under-the-bed/

No doubt, problems with annotation processing do exist, but so do solutions to those problems. One problem that stands out in particular, is the difficulty in unit testing annotation processors. A problem that elementary, a suite of JUnit 5 extensions, solves.

What’s this Annotation Processing Thingamajig?

For the uninitiated, an annotation processor is similar to a compiler plug-in. Like it’s namesake, it can be called by the compiler to process annotations, i.e. @Nullable during compilation. Said process covers an extremely broad and vague expanse. Everything from simple value validation to a full-blown pluggable type system like the checker-framework. A simple @Builder annotation builder to full-blown dependency injection via code generation like Dagger.

Post Java 9, it resides inside the java.compiler module. Inside an annotation processor lies the fabled domain of Elements and TypeMirrors, abstract Syntax Tree (AST) representations of the Java language and counterparts to the reflection framework found in Javaland. Elements represent syntactical constructs such as methods, arrays etc. while TypeMirrors represent, well, types such as reference types (classes) and primitives but we digress.

Why So Difficult?

So what makes testing annotation processing so difficult? In our opinion, everything about the annotation processing environment. We’re not claiming that the environment is some evil grotesque being, it’s actually surprisingly well-designed. The problem lies squarely with the unavailability of the environment outside the compiler. Without its environment, testing an annotation processor is a lost cause.

A good drinking game is taking a shot for each method call in an annotation processor that requires an annotation processing environment.

import com.karuslabs.utilitary.Logger;
import com.karuslabs.utilitary.type.TypeMirrors;
import java.util.Set;
import javax.annotation.processing.AbstractProcessor;
import javax.annotation.processing.ProcessingEnvironment;
import javax.annotation.processing.RoundEnvironment;
import javax.lang.model.element.TypeElement;
import javax.lang.model.element.VariableElement;
import javax.lang.model.util.Elements;

class StringFieldLint extends AbstractProcessor {
    Elements elements;
    TypeMirrors types;
    Logger logger;    
    @Override
    public void init(ProcessingEnvironment environment) {
        super.init(environment);
        elements = environment.getElementUtils(); // (1)
        types = new TypeMirrors(elements, environment.getTypeUtils()); // (2)
        logger = new Logger(environment.getMessager()); // (3)
    }
    
    @Override
    public boolean process(Set<? extends TypeElement> set, RoundEnvironment round) {
        var elements = round.getElementsAnnotatedWith(Case.class); // (4)
        for (var element : elements) {
            if (!(element instanceof VariableElement)) {
                logger.error(element, "Element is not a variable"); // (5)
                continue;
            }
            
            var variable = (VariableElement) element;
            if (!types.isSameType(variable.asType(), types.type(String.class))) { // (6) (7) (8)
                logger.error(element, "Element is not a string"); // (9)
                continue;
            }
        }
        return false;
    }
}

Pretty much everything requires an annotation processing environment as illustrated above.

At this junction, we have four solutions to overcome this pickle:

  • Don’t bother with unit testing
  • Wait for something, anything to happen
  • Mock/re-implement the annotation processing environment
  • Smuggle the annotation processing environment out of the compiler

To keep a long story short, we ended up becoming smugglers.

Smuggler’s Discovery

While trawling the web, we discovered Google’s compile-testing project, a hidden gem buried beneath the swathes of GitHub projects. Through some clever hacks, the project managed to provide an annotation processing environment for unit tests albeit a little lackluster and limited. Exploring the project, it became obvious that it wasn’t the panacea that we had hoped. The project suffered from a few limitations that we weren’t able to stomach:

  • Supports only JUnit 4. The annotation processing environment is only available through a JUnit rule, something that is no longer supported in JUnit 5. We have been using JUnit 5 for the longest time and don’t intend to downgrade anytime soon.
  • The utilities for working with the annotation processing environment is limited. It works, but it can be significantly more ergonomic.
  • Inability to traverse the Elements and TypeMirrors of compiled files in a test. This is essential to allow compiled files to be used as test cases.
  • Scope limitation of the annotation processing environment. The annotation processing environment is limited to the scope of a test method. This is inconvenient as initialization of test state cannot be shared between multiple tests. Furthermore, the design lends itself to unexpected behaviour.
class SomeTest {
  @Rule CompilationRule rule = new CompilationRule();
  Types types = rule.getTypes(); // Throws an exception when the method can be called
  
  @Test
  void test() {
    ...
  }
}

This isn’t to say that the project is bad, just that our objectives are different. In fact, some parts of elementary is based on compile-testing. As its name implies, compile-testing focuses on testing the compilation of code, not annotation processing. That’s not our objective. Our objective is to simplify unit testing annotation processors. Thus, after a healthy dose of “Hold my beer” and Not Invented Here Syndrome, the elementary project was conceived.

Elementary, My Dear Watson

With compile-testing as a foundation, we embarked on a quest to bring Elementary to life. Starting with a clean slate blessed us with the freedom to make decisions that would otherwise incite an angry mob with pitchforks and torches:

  • Support only Java 11 & above. The module system in Java 9 introduced some breaking changes to the jdk.compiler module and ClassLoaders. We don't want to deal with that.
  • Support only JUnit 5. We do not want to support a JUnit 4 equivalent that we do not use.

Our experience working on Chimera code generation tool told us that tests for annotation processors fell into the classic black-box and white-box testing categories. For small and/or simple annotation processors, it was more efficient to invoke the annotation processor inside a compiler against sample Java source files. As the complexity and size of an annotation processor increases, running the annotation processor against sample files yields diminishing returns. It will be far less tedious to isolate and test the individual logical components. Two different categories with two completely different sets of requirements.

Box of Fun Things

Black-box testing annotation processors can be fun. It doesn’t have to be a myriad of set-up, tear-down and configuration. Not accordingly to JavacExtension at least. For each test, JavacExtension compiles a suite of test cases with the given annotation processor(s). The results of the compilation is then funneled to the test method for subsequent assertions. All configuration is handled via annotations with no additional set-up or tear-down required.

They say seeing is believing so let’s get on with the seeing.

Our imaginary annotation processor is fairly straightforward. All it does is check whether an element that is annotated with @Case is also a string field. If an element isn't a string or variable, an error message is printed. Since it's that straightforward, just black-box testing our annotation processor is enough.

@SupportedAnnotationTypes({"*"})
class ImaginaryProcessor extends AnnotationProcessor {
    @Override
    public boolean process(Set<? extends TypeElement> set, RoundEnvironment round) {
        var elements = round.getElementsAnnotatedWith(Case.class);
        for (var element : elements) {
            if (element instanceof VariableElement)) {
                var variable = (VariableElement) element;
                if (!types.isSameType(variable.asType(), types.type(String.class))) {
                    logger.error(element, "Element is not a string");
                }
            } else {
                logger.error(element, "Element is not a variable");
            }
        }
        return false;
    }
}

Testing our imaginary annotation processor isn’t too difficult either. All we need to do is to sprinkle a few annotations on the test class, create some test cases, check the compilation results, and Voila! We’re done.

import com.karuslabs.elementary.Results;
import com.karuslabs.elementary.junit.JavacExtension;
import com.karuslabs.elementary.junit.annotations.Case;
import com.karuslabs.elementary.junit.annotations.Classpath;
import com.karuslabs.elementary.junit.annotations.Options;
import com.karuslabs.elementary.junit.annotations.Processors;

@ExtendWith(JavacExtension.class)
@Options("-Werror")
@Processors({ImaginaryProcessor.class})
@Classpath("my.package.ValidCase")
class ImaginaryTest {
    @Test
    void process_string_field(Results results) {
        assertEquals(0, results.find().errors().count());
    }
    
    @Test
    @Classpath("my.package.InvalidCase")
    void process_int_field(Results results) {
        assertEquals(1, results.find().errors().contains("Element is not a string").count());
    }
}

Let’s break down the code snippet.

  • By annotating the test class with @Options, we can specify the compiler flags used when compiling the test cases. In this snippet, -Werror indicates that all warnings will be treated as errors.
  • To specify which annotation processor(s) is to be invoked with the compiler, we can annotate the test class with @Processors. No prizes for correctly guessing which annotation processor in this snippet.
  • Test cases can be included for compilation by annotating the test class with either @Classpath or @Inline. Java source files on the classpath can be included using @Classpath while strings inside @Inline can be transformed into an inline source file for compilation. In this snippet, both ValidCase and InvalidCase is included for compilation.
  • An annotation’s scope is tied to its target’s scope. If a test class is annotated, the annotation will be applied for all test methods in that class. On the same note, an annotation on a test method will only be applied on said method.
  • Results represent the results of a compilation. We can specify Results as a parameter of test methods to obtain the compilation results. In this snippet, process_string_field(...) will receive the results for ValidCase while process_int_field(...) will receive the results for both ValidCase and InvalidCase.

Pandora’s Box

This is where things become really interesting. White-box testing isn’t as simple as invoking an annotation processor since the possibilities of what a test is trying to prove is unlimited. In a black-box test, we need only to prove that the compilation results of a known annotation processor against a fixed number of files matches certain criterion. On the contrary, in a white-box test, we do not know why, what and how a component is being tested. The best we can do is make the annotation processing environment accessible inside the test class.

“It can’t be that difficult to allow class scoped annotation processing environments, compile-testing already does that.”

We too, initially felt the same way and boy, were we wrong. While compile-testing does provide an annotation processing environment, it is limited to the scope of a test method. Not being able to access said environment outside of methods means repetitive and verbose initialization code, which blows. Sadly, we couldn’t just tweak compile-testing’s trick either as it was found to be incompatible with our objective.

The secret sauce behind compile-testing is actually pretty straightforward. Each test method is intercepted by a JUnit rule and wrapped in an annotation processor that invokes the method during processing. The test is subsequently executed inside a compiler that the JUnit rule invokes. Unfortunately, in this technique, an annotation processing environment is available only when a test method. It isn’t possible to tweak the technique to intercept the creation of a test instance and inject the test instance inside an annotation processor either due to the constraints of the JUnit lifecycle.

A great deal of time spent at the drawing board later, we succeeded in creating the ToolsExtension. This extension exploited the fact that a test instance only needed access to an annotation processing environment. Tests didn't need to be executed inside an annotation processor. Once we established that, our trick was run a compiler with a blocking annotation processor on a daemon thread before each test instance was created. With compilation suspended inside the processor, the environment is made accessible to the test instance on the main thread. Only after all tests has been executed does compilation resume.

Here’s a poorly drawn MS Paint diagram illustrating the entire process
Here’s a poorly drawn MS Paint diagram illustrating the entire process

Let’s pretend that as a result of the imaginary processor we described in Box of Fun Things having grown in scope and size, it was refactored into multiple components, one of which checks if an element is a string variable like the original annotation processor.

class Lint {
    
    final TypeMirrors types;
    final TypeMirror expectedType;
    
    Lint(TypeMirrors types) {
        this.types = types;
        this.expectedType = types.type(String.class);
    }
    
    public boolean lint(Element element) {
        if (!(element instanceof VariableElement)) {
            return false;
        }
        
        var variable = (VariableElement) element;
        return types.isSameType(expectedType, variable.asType());
    }
    
}

Using the ToolsExtension to test the annotation processor yields the following code snippet:

import com.karuslabs.elementary.junit.Cases;
import com.karuslabs.elementary.junit.Tools;
import com.karuslabs.elementary.junit.ToolsExtension;
import com.karuslabs.elementary.junit.annotations.Inline;
import com.karuslabs.utilitary.type.TypeMirrors;

@ExtendWith(ToolsExtension.class)
@Inline(name = "Samples", source = {
"import com.karuslabs.elementary.junit.annotations.Case;",
"",
"class Samples {",
"  @Case(\"first\") String first;",
"  @Case String second() { return \"\";}",
"}"})
class ToolsExtensionExampleTest {
    
    Lint lint = new Lint(Tools.typeMirrors());
    
    @Test
    void lint_string_variable(Cases cases) {
        var first = cases.one("first");
        assertTrue(lint.lint(first));
    }
    
    @Test
    void lint_method_that_returns_string(Cases cases) {
        var second = cases.get(1);
        assertFalse(lint.lint(second));
    }
    
}

Let’s break down the code snippet:

  • By annotating the class with @Inline we can specify an inline Java source file which ToolsExtension includes for compilation.
  • The annotation processing environment can be accessed via either the Tools class or dependency injection into the test class's constructor or test methods. In this case, we access the current TypeMirrors using the static method on Tools.
  • An in-depth explanation for both @Case and Cases will be provided in the following section. For now, it's just the mechanism used to find elements in compiled files.

The Case for Cases

With the completion of ToolsExtension, we succeeded in our quest to smuggle an annotation processing environment out of the compiler. Yet one final piece in the puzzle still remains. How do we create those elements to test our code against? The jdk.compiler module doesn't provide a way to create elements. While mocking an Element is possible it is far from developer-friendly. Not only is the initialization verbose, unwieldy and convoluted, it is also difficult to guarantee that the mocked element's behaviour matches its actual counterpart. We can't look to compile=testing for guidance either since it doesn't provide anything like that.

After much headache, we managed to find the missing piece. Let’s have the compiler transform our test cases written in idiomatic Java into elements for us. That way, we avoid the mess surrounding the initialization of elements and the resultant code is far easier to understand. To achieve that, we required some way to fetch elements from the compiler. After further refinement of the concept, we eventually developed the Cases class and corresponding @Case annotation.

Returning to our code snippet from Pandora’s Box, let’s analyze it in greater detail.

import com.karuslabs.elementary.junit.Cases;
import com.karuslabs.elementary.junit.Tools;
import com.karuslabs.elementary.junit.ToolsExtension;
import com.karuslabs.elementary.junit.annotations.Inline;
import com.karuslabs.utilitary.type.TypeMirrors;

@ExtendWith(ToolsExtension.class)
@Inline(name = "Samples", source = {
"import com.karuslabs.elementary.junit.annotations.Case;",
"",
"class Samples {",
"  @Case(\"first\") String first;",
"  @Case String second() { return \"\";}",
"}"})
class ToolsExtensionExampleTest {
    
    Lint lint = new Lint(Tools.typeMirrors());
    
    @Test
    void lint_string_variable(Cases cases) {
        var first = cases.one("first");
        assertTrue(lint.lint(first));
    }
    
    @Test
    void lint_method_that_returns_string(Cases cases) {
        var second = cases.get(1);
        assertFalse(lint.lint(second));
    }
    
}
  • By annotating a test case with @Case inside a Java source file, we can fetch its corresponding element from Cases. A @Case may also contain a label to simplify retrieval.
  • Through Cases, we can fetch elements by either the label or index of the case. We can obtain an instance of Cases via Tools.cases or like in this code snippet, through dependency injection.

Idea Graveyard

As mentioned at the beginning of this article, we explored a few other avenues which eventually led to dead-ends. We thought them to be interesting enough to discuss in the following sections. Most of them ended up getting shelved due to the impracticality and unacceptable trade-offs for the solution.

Not testing annotation processors goes without saying to be a terrible choice. Just because testing them is difficult doesn’t give us the liberty of skipping that. The problems will only worsen over time if we choose to take the easy route out. Furthermore, most annotation processors usually do code generation and static type analysis. Both of which are extremely difficult to troubleshoot.

“Good things come to those wait. But better things come to those who work for it.”

Had JEP 119: javax.lang.model Implementation Backed by Core Reflection been shipped with JDK 8, I highly doubt elementary would have even been conceived. It solved the issue with accessing an annotation processing environment outside of a compiler by providing a standard implementation. Sadly, it was shelved and future efforts seems to have stalled. A wait and see approach to unit testing annotation processors would thus be unfeasible as there isn’t anything to wait on.

A problem more difficult than testing annotation processing is trying to mock/re-implement the annotation processing environment. Since elements represent an AST for the Java language, we need to be intimate with the language specification to guarantee that the behaviour of mocked/re-implemented elements do not deviate from the original. This honestly makes testing annotation processors seem like a Disney fairy-tale, we don’t want to touch that even with a ten-foot pole. A few existing re-implementations do exist but seem to have been long-abandoned for years. In the end, it boils down to the troubles outweighing the benefits that led us to abandon this avenue.

Final Thoughts

We’ve reached the end of our journey to simplify the testing of annotation processors. Looking back, it has been an absolute blast working on the project. How adopted this project is still remains to be seen. But if anything, I hope that this article encouraged you start playing around with annotation processors.

In summary, Elementary introduces:

  • The JavacExtension for black-box testing and testing of simple annotation processors.
  • A class-scoped annotation processing environment for test classes annotated with ToolsExtension. Utilities for fetching elements from the compiler to the test class

That said, this is only beginning of yet another journey. A journey that I am hopeful will bring many new feature and improvements to elementary in the time to come. Until the next time, happy coding!


Article was originally published on Medium. Shameless advertising This article is based on Elementary, https://github.com/Pante/elementary.


<- Index

Should array variable names be plural

Question from Abdul#4709

Just wondering for arrays is it considered a better practice to keep the var names plural?

Or I guess it depends on language?

For me, when I name a collection of things it usually ends up that using plural describes it better, but it all depends on what you are naming.

I wouldn't make a collection of cats and call it cat. That's just confusing. But if you are keeping some messages in a queue messageQueue works as a name since having the word queue implies both the direction data flows and the idea of possible plurality.


<- Index

Should a game use mutable state

Question from decentDrei#7560

Would you say that a basic game, like a canvas game should use mutable state?

2d physics. Standard side-scroll

Pretty solidly going to say not to worry about it. Games were made forever with a global mutable area for gamestate.

https://youtu.be/aKLntZcp27M

This talk has some good pointers on game dev. It's part of a much larger topic but the speaker is good. And there are gradients to avoiding mutation. Don't avoid it like the plague, just keep it in mind.


<- Index

JS vs Java - dynamic typing

by: Ethan McCue

Conversation with somebody#0002

So as far as JS and Java are concerned the syntaxes of the two languages are very similar but the semantics between the two vary wildly even discounting browser weirdness.

but for simple things it's very similar. small algorithms and things like that

For simple things all languages are similar

okay, so here's a bet. I will write a bit of javascript code. very small, very simple.

You will try to write some mock Java that does the same thing.

I bet that you will find it harder to do

function upperCaseName(entity) {
    return {...entity, name: entity.name.toUpperCase() }
}

const dog = { name: "Fido", favorite_toy: "Squeeks" }
const person { name: "Bob", majored_in: "Physics" , age: 30 }

upperCaseName(dog) //  { name: "FIDO", favorite_toy: "Squeeks" }
upperCaseName(person) // { name: "BOB", majored_in: "Physics" , age: 30 }

Objects are a JS-specific thing, but... that should be doable in Java. won't look nice, won't behave well, but it'll be doable

oh so the thing that most libraries in JS use as their primary data representation are JS specific?

yes

in Java you'd use a POD for them

in JS you don't want that overhead, in Java it's kinda unavoidable

POD?

plain old data

You mean a class with getters and setters right? That won't work here.

Dynamic languages have a whole set of design patterns unique to their inherit flexibility in the same way Functional Languages like Haskell have a whole set of design patterns unique to their inherit restrictions.

Javascript is dynamic and weakly typed and Java is static and strongly typed. The second bit there is very important because it clues to the underlying semantics of the language.

yeah, of course.

also kinda working in Java:

import java.util.Map;
import java.util.HashMap;

public class Main {
  public static void main(String[] args) {
      System.out.println(upperCaseName(new HashMap<String, Object>() {{ put("name", "Fido"); put("favorite_toy", "Squeeks"); }}));
  }

  static Map<String, Object> upperCaseName(Map<String, Object> map) {
      if (map.containsKey("name") && map.get("name") instanceof String) {
          map.put("name", ((String) map.get("name")).toUpperCase());
      }
      return map;
  }
}

In Javascript saying class has an insanely different meaning than in Javascript.

In java when you say class you are declaring a template for a concrete object to be created later. You are saying that your object will have these slots, that your object will have these methods that will work on said slots of data, and you define how the object will be constructed.

But in Javascript you aren't doing that. In Javascript you are actually creating an object.

That object is the "prototype" for new objects to be created from and you are saying "hey just copy this object"

that's the prototype, not the class

There is no such thing as the class Not in any form that isn't syntax sugar.

The main similarity between JS and Java is the syntax, which was done on purpose. Hence the misleading name of JavaScript.

The whole point was to look like Have at first glance but at its core JS is more a badly implemented lisp than anything else.

Now I am going to do the dog example in idiomatic java give me a few minutes

import java.util.Objects;

interface HasName<T extends HasName<T>> {
    String getName();
    T withName(String name);
}

class Dog implements HasName<Dog> {
    private final String name;
    private final String favoriteToy;

    Dog(String name, String favoriteToy) {
        Objects.requireNonNull(name, "Name should not be null");
        Objects.requireNonNull(favoriteToy, "Favorite Toy should not be null");
        this.name = name;
        this.favoriteToy = favoriteToy;
    }

    @Override
    public String getName() {
        return this.name;
    }

    @Override
    public Dog withName(String name) {
        return new Dog(name, this.favoriteToy);
    }

    public String getFavoriteToy() {
        return this.favoriteToy;
    }

    public Dog withFavoriteToy(String favoriteToy) {
        return new Dog(this.name, favoriteToy);
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Dog dog = (Dog) o;
        return Objects.equals(name, dog.name) &&
                Objects.equals(favoriteToy, dog.favoriteToy);
    }

    @Override
    public int hashCode() {
        return Objects.hash(name, favoriteToy);
    }

    @Override
    public String toString() {
        return "Dog{" +
                "name='" + name + '\'' +
                ", favoriteToy='" + favoriteToy + '\'' +
                '}';
    }
}

class Person implements HasName<Person> {
    private String name;
    private String majoredIn;
    private int age;

    Person(String name, String majoredIn, int age) {
        Objects.requireNonNull(name, "Name should not be null");
        Objects.requireNonNull(majoredIn, "Majored In should not be null");
        this.name = name;
        this.majoredIn = majoredIn;
        this.age = age;
    }

    @Override
    public String getName() {
        return this.name;
    }

    @Override
    public Person withName(String name) {
        return new Person(name, this.majoredIn, this.age);
    }

    public String getMajoredIn() {
        return this.majoredIn;
    }

    public Person withMajoredIn(String majoredIn) {
        return new Person(this.name, majoredIn, this.age);
    }

    public int getAge() {
        return this.age;
    }

    public Person withAge(int age) {
        return new Person(this.name, this.majoredIn, age);
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Person person = (Person) o;
        return age == person.age &&
                Objects.equals(name, person.name) &&
                Objects.equals(majoredIn, person.majoredIn);
    }

    @Override
    public int hashCode() {
        return Objects.hash(name, majoredIn, age);
    }

    @Override
    public String toString() {
        return "Person{" +
                "name='" + name + '\'' +
                ", majoredIn='" + majoredIn + '\'' +
                ", age=" + age +
                '}';
    }
}


public class Main {
    private static <T extends HasName<T>> T upperCaseName(HasName<T> entity) {
        return entity.withName(entity.getName().toUpperCase());
    }

    public static void main(String[] args) {
        Dog fido = new Dog("Fido", "Squeeks");
        Person bob = new Person("Bob", "Physics", 30);
        System.out.println("Before:");
        System.out.println(fido);
        System.out.println(bob);

        Dog fidoUpper = upperCaseName(fido);
        Person bobUpper = upperCaseName(bob);

        System.out.println("After:");
        System.out.println(fidoUpper);
        System.out.println(bobUpper);
    }
}

<- Index

How to do abstractions in JS

Question from ForkyFork#7118

javascript does not have good OOP, no types

e.g. how are you going to do abstraction in js? interfaces?

You really haven't been around the block enough to compare languages to Java.

So here is the issue with that as a question.

When you say "abstraction", what you are referring to are the static typing constructs in Java that allow you to declare the relationships between objects. Just because JS lacks the ability to declare those constructs in a way that can be checked by a compiler doesn't mean you can't write abstracted code in it.


<- Index

How to append to an array in C

Question from Deleted User

is there a way to append to an array in C

not linked lists

i have a callback function which needs to add an item to an array every time it's called

i can't figure out how to do that

Conceptually an array is not a good fit for continual appending. If you want to do an operation like that you would be best off writing or finding your own "ArrayList" kind of wrapper.

is there a way to find the index of the last item in the list

array*

An array in c is a fixed size block of memory that you have a pointer to the start of.

It is up to you to keep track of the size of that array usually in a separate int.

Because you only have the pointer to the start of the block of memory, nothing about that pointer can tell you how much memory ahead of that you are allowed to access.

So if you keep the size of the array and the pointer to the start of the array held somewhere then the "index of the last item" is something you implicitly know.

I might be wrong when I say to store it in an int it might be a usize or some other type.

time to check out rust

if anyone can help me out with this i might consider going back to C

The type you are looking for in rust is probably Vec.

Can you share your problem and what you were thinking of for a solution? I can maybe whip up a quick A/B of what it would be in C vs Rust.

I have a function which retrieves a few rows from a database and uses a callback function to individually process each row. The callback function is supposed to map the row's data to a structure and add it to an array.

this is what i was trying to do

void handle_entry(void *entries, int argc, char **argv, char **column_name){
  Entry entry;
  strcpy(entry.title, argv[1]);
  strcpy(entry.content, argv[2]);
}

void get_entries(){
  sqlite3 *DB;
  sqlite3_stmt *stmt;
  char *sql = "SELECT * FROM Entries;";
  Entry *entries[100];
  sqlite3_open("entries.db", &DB);
  sqlite3_exec(DB, sql, handle_entry, entries, 0);

What is the schema for entries?

ID, Title, Content

typedef struct Entry {
  int ID;
  char *title;
  char *content;
} Entry;
CREATE TABLE Entries(
              ID INTEGER PRIMARY KEY,
              Title CHAR(512),
              Content TEXT);

Here ya go

use rusqlite::{Connection, NO_PARAMS};

#[derive(Debug)]
struct Entry {
    id: i64,
    title: String,
    content: String
}

fn insert_entry(conn: &Connection, entry: &Entry) -> rusqlite::Result<usize> {
    conn.execute("\
    INSERT INTO Entries (Title, Content)
        VALUES (?1, ?2)
    ",
    &[&entry.title, &entry.content])
}

fn all_entries(conn: &Connection) -> rusqlite::Result<Vec<Entry>> {
    let mut stmt = conn
        .prepare("SELECT ID, Title, Content FROM Entries")?;
    let entry_iter = stmt.query_map(
        NO_PARAMS,
        |row| Ok(Entry{
            id: row.get(0)?,
            title: row.get(1)?,
            content: row.get(2)?
        })
    )?;

    let mut entries = Vec::new();
    for entry in entry_iter {
        entries.push(entry?);
    }

    Ok(entries)
}

fn main() -> rusqlite::Result<()> {
    let conn = Connection::open("db.sqlite")?;
    conn.execute("\
    CREATE TABLE IF NOT EXISTS Entries(
              ID INTEGER PRIMARY KEY,
              Title CHAR(512),
              Content TEXT);
    ",
    NO_PARAMS)?;

    let entry_1 = Entry {
        id: 0,
        title: String::from("Entry Number One"),
        content: String::from("This is the text for my entry.")
    };

    let entry_2 = Entry {
        id: 1,
        title: String::from("Le Second Entreee"),
        content: String::from("2. What is 2? Can you taste it?")
    };

    insert_entry(&conn,&entry_1)?;
    insert_entry(&conn,&entry_2)?;



    println!("{:?}", all_entries(&conn));

    Ok(())
}
/Users/emccue/.cargo/bin/cargo run --color=always --package sequel --bin sequel
   Compiling sequel v0.1.0 (/Users/emccue/Development/sequel)
    Finished dev [unoptimized + debuginfo] target(s) in 1.63s
     Running `target/debug/sequel`
Ok([Entry { id: 1, title: "Entry Number One", content: "This is the text for my entry." }, Entry { id: 2, title: "Le Second Entreee", content: "2. What is 2? Can you taste it?" }])

Process finished with exit code 0

that's gonna take a long time to understand for me

im still in ch one in the rust tutorial

That's fair. C is definitely a simpler language in terms of number of concepts you need to understand.


<- Index

I like it when "if" is an expression

by: Ethan McCue

One of the small conveniences ive grown to really like in a programming language is "if" being an expression.

I found myself hacking it into some JS I was writing today.

const thing = (() => {
    if (condition) {
        return "hello";
    } else {
        return "world";
    }
})();

Now, feel free to weigh in on that being bad practice.

as a non JS programmer I have no idea what "thing = (() => {" is supposed to do

That creates an anonymous function then calls it immediately.


<- Index

The Ferry Problem in Rust

by: Ethan McCue

Someone else's C homework done in rust

// Ferry Loading
// Before bridges were common, ferries were used to transport cars across rivers.
// River ferries, unlike their larger cousins, run on a guide line and are powered by the
// river's current. Cars drive onto the ferry from one end, the ferry crosses the river, and
// the cars exit from the other end of the ferry.
// There is an l-meter-long ferry that crosses the river. A car may arrive at either
// river bank to be transported by the ferry to the opposite bank. The ferry travels
// continuously back and forth between the banks so long as it is carrying a car or there
// is at least one car waiting at either bank. Whenever the ferry arrives at one of the
// banks, it unloads its cargo and loads up cars that are waiting to cross as long as they
// fit on its deck. The cars are loaded in the order of their arrival and the ferry's deck
// accommodates only one lane of cars. The ferry is initially on the left bank where it
// had mechanical problems and it took quite some time to fix it. In the meantime, lines
// of cars formed on both banks that wait to cross the river.
// The first line of input contains c, the number of test cases. Each test case begins
// with the number l, a space and then the number m. m lines follow describing the cars
// that arrive in this order to be transported. Each line gives the length of a car (in
// centimeters), and the bank at which the car awaits the ferry ("left" or "right").
// For each test case, output one line giving the number of times the ferry has to cross
// the river in order to serve all waiting cars.
// Sample input
// 4
// 20 4
// 380 left
// 720 left
// 1340 right
// 1040 left
// 15 4
// 380 left
// 720 left
// 1340 right
// 1040 left
// 15 4
// 380 left
// 720 left
// 1340 left
// 1040 left
// 15 4
// 380 right
// 720 right
// 1340 right
// 1040 right

use std::collections::LinkedList;
use std::error::Error;
use std::io::BufRead;

#[derive(Debug, PartialEq, Eq)]
enum RiverBank {
    Left,
    Right,
}

impl RiverBank {
    fn try_from(value: &str) -> Result<RiverBank, Box<Error>> {
        match value {
            "left" => Ok(RiverBank::Left),
            "right" => Ok(RiverBank::Right),
            _ => Err("A river bank must either be \"left\" or \"right\"".into()),
        }
    }

    fn switch(&self) -> RiverBank {
        match self {
            &RiverBank::Left => RiverBank::Right,
            &RiverBank::Right => RiverBank::Left,
        }
    }
}

type CarLength = u64;
type FerryLength = u64;

#[derive(Debug)]
struct StartingCarState {
    river_bank: RiverBank,
    car_length: CarLength,
}

#[derive(Debug)]
struct FerryProblem {
    ferry_length: FerryLength,
    car_descriptions: Vec<StartingCarState>,
}

fn read_input(from: impl BufRead) -> Result<Vec<FerryProblem>, Box<Error>> {
    let mut lines = from.lines();
    let first_line = match lines.next() {
        Some(Ok(line)) => line,
        Some(Err(err)) => {
            return Err(err.into());
        }
        None => return Err("There was no first line provided to stdin".into()),
    };

    let number_of_test_cases: u64 = first_line
        .parse()
        .map_err(|_| "The first line needs to be a single non-negative number.")?;

    let mut ferry_problems = Vec::new();

    for _ in 0..number_of_test_cases {
        let ferry_description_line = match lines.next() {
            Some(Ok(line)) => line,
            Some(Err(err)) => {
                return Err(err.into());
            }
            None => {
                return Err("Ran out of input when parsing test cases".into());
            }
        };

        let (ferry_length, number_of_cars) = {
            let split_by_whitespace: Vec<&str> = ferry_description_line.split("\\w+").collect();

            if split_by_whitespace.len() == 2 {
                let mut ferry_length = split_by_whitespace[0].parse()?;
                ferry_length *= 100;
                let number_of_cars: u64 = split_by_whitespace[1].parse()?;
                (ferry_length, number_of_cars)
            } else {
                return Err("Malformed ferry description line".into());
            }
        };

        let mut car_descriptions = Vec::new();
        for _ in 0..number_of_cars {
            let car_description_line = match lines.next() {
                Some(Ok(line)) => line,
                Some(Err(err)) => {
                    return Err(err.into());
                }
                None => return Err("Not enough descriptions of cars given".into()),
            };

            let car_state = {
                let split_by_whitespace: Vec<&str> = car_description_line.split("\\w+").collect();

                if split_by_whitespace.len() == 2 {
                    let car_length = split_by_whitespace[0].parse()?;
                    let river_bank = RiverBank::try_from(split_by_whitespace[1])?;

                    StartingCarState {
                        river_bank: river_bank,
                        car_length: car_length,
                    }
                } else {
                    return Err("Malformed car description line".into());
                }
            };

            car_descriptions.push(car_state);
        }

        ferry_problems.push(FerryProblem {
            ferry_length,
            car_descriptions,
        })
    }

    Ok(ferry_problems)
}

#[derive(Debug)]
enum FerryProblemSolution {
    RequiredTrips(u64),
    Impossible,
}

fn required_crossings(problem: FerryProblem) -> FerryProblemSolution {
    let mut left_bank: LinkedList<CarLength> = problem
        .car_descriptions
        .iter()
        .filter(|car| car.river_bank == RiverBank::Left)
        .map(|car| car.car_length)
        .collect();
    let mut right_bank: LinkedList<CarLength> = problem
        .car_descriptions
        .iter()
        .filter(|car| car.river_bank == RiverBank::Right)
        .map(|car| car.car_length)
        .collect();

    let mut bank = RiverBank::Left;

    let mut used_capacity = 0;
    let mut trips_made = 0;

    loop {
        let next_car = match bank {
            RiverBank::Left => left_bank.pop_front(),
            RiverBank::Right => right_bank.pop_front(),
        };

        match next_car {
            Some(car) => {
                if car > problem.ferry_length {
                    return FerryProblemSolution::Impossible;
                } else if car + used_capacity > problem.ferry_length {
                    trips_made += 1;
                    used_capacity = 0;
                    match bank {
                        RiverBank::Left => left_bank.push_front(car),
                        RiverBank::Right => right_bank.push_front(car),
                    };
                    bank = bank.switch()
                } else {
                    used_capacity += car;
                }
            }
            None => {
                trips_made += 1;

                let other_bank_is_empty = match bank {
                    RiverBank::Left => right_bank.is_empty(),
                    RiverBank::Right => left_bank.is_empty(),
                };
                if other_bank_is_empty {
                    return FerryProblemSolution::RequiredTrips(trips_made);
                } else {
                    used_capacity = 0;
                    bank = bank.switch()
                }
            }
        }
    }
}

fn main() -> Result<(), Box<Error>> {
    let test_input = "4
20 4
380 left
720 left
1340 right
1040 left
15 4
380 left
720 left
1340 right
1040 left
15 4
380 left
720 left
1340 left
1040 left
15 4
380 right
720 right
1340 right
1040 right
";
    /* replace with io::stdin().lock() for real input */
    let problems = read_input(test_input.as_bytes())?;
    for problem in problems {
        let solution = required_crossings(problem);
        match solution {
            FerryProblemSolution::Impossible => println!("There is no solution"),
            FerryProblemSolution::RequiredTrips(trips) => println!("{}", trips),
        }
    }
    Ok(())
}

<- Index

Is there a better way of storing a JSON entry

Question from Anonymous

class Event:
    def __init__(self, name, time, location):
        self.name = name
        self.time = time
        self.location = location

Is this a good thing to do or should I not

(Deleted User) I'd suggest a namedtuple instead. it's much faster and efficient.

Or dataclasses if you're using py 3.7.

How do I use a namedtuple?

Actually quick disclaimer.

The way python's JSON module works, it's impossible for you to dump namedtuples as anything other than a list.

Here you should just use a dictionary (IMO).

event = {"name": name,
         "time": time,
         "location": location}

But if I'm being honest that's just my clojure brain talking.

pro: If you deserialize from json you get this anyways and you can do round-trip serde and have the same representation.

con: The "shape" of the dict doesn't have a name and you can't use the . syntax to access stuff by default.

If you want to use the dot-notation for access with a dict just wrap it in this lib.

https://github.com/Infinidat/munch

In general, you don't gain much by "dressing up" your data though named tuples are immutable, which is nice.


<- Index

How to make a custom python templating engine

Question from Anonymous

I have now used a language I am barely familiar with (Python) (I don't like the scopes) to parse a webpage I built off of online tutorials and it's shitty embeds of python so I could generate a new page that has the current information from Google Calendar and presentation slides from Dropbox, neither of which are APIs I had experience with. I have the shittiest code ever written

<div id="SlideshowContainer">
    ${"\n\t\t\t".join(getImageFiles(imageList))}
</div>

a small sample of the HTML part.

But I don't like Python, it's just what I'm required to use

I have the code execution working, it's basically my own implementation of templates

Python scopes trip me up for no reason, as well as types Holy wars of editors don't ever change my opinion; I just use Visual Studio because I normally use C#, but I switched to IDLE for this one, because I was hoping it would be able to work better for Python (it doesn't)

What exactly do you mean by scopes though? like, declarations of things?

if thing:
    x = 2
else:
    x = 3
# x exists here

Declarations in particular, but then I always forget the global keyword in particularly

Wait, that's a codesmell. In 5 years of python ive never needed to use the global keyword more than 1 time and even that was a mistake.

I've had to use nonlocal for one assignment - but this is indicitive probably of a larger problem in how you write code.

No, Python just feels like shitty scripting IMO and I treat Python like I treat a one shot bash/batch script.

(my main interest here is just to get to the root of why you feel this way)

(not to knock the case of one off hacky scripts - it's pretty good for that)

But I only ever need Python for either scripting or small programs

It's only in use here because it's the one language I can remotely use that works on Linux

What are the other languages you can use? That might give me some context on what angle you are coming from here.

C# mostly

okay, ill loop back around after I have run your code, but I have a hypothesis here.

Also, if you have any of your C# code that you can share I am curious how you write code with the guard rails of static typing.

I have code that I think kinda doesn't suck, but it does in retrospect

Good enough.

So anyways, first I'm walking through your script. First issue

data = []
for i in range(0, 10):
    data.append(("&nbsp", "&nbsp", "&nbsp"))

"data" tells me nothing about what this is. Why do you have a list of 10 tuples of 3 non-breaking space html escapes? They also aren't the whole thing you need since you should have a semicolon at the end if I remember correctly &nbsp;

command = "value = " + match.groups(1)[0]
print(command)
exec(command, {}, enviormentVariablesInFile)

I think you see this coming, but it is worth pointing out anyways.

exec is never what you want.

Part of what is going wrong here is that, beyond using exec as a shorthand for a calculator basically, is that you are assigning to a variable.

If anything, at least put the value assignment outside of the command you want to run.

But let's look a bit closer at what you are using it for

    pattern = re.compile("\\${(.*?)}")

    enviormentVariablesInFile = {"data": data, "imageList": imageList, "value": None}

    for match in reversed(list(pattern.finditer(output))):
        span = match.span()
        command = "value = " + match.groups(1)[0]
        print(command)
        exec(command, {}, enviormentVariablesInFile)
        value = enviormentVariablesInFile["value"]
        if value is None:
            value = ""
        i += 1
        output = output[:span[0]] + value + output[span[1]:]

The first issue here is pattern. Regular expressions are not readable. You need to pick a name for that thing that describes what it does.

I can't reverse engineer it, so lets pretend it finds unicorns.

    unicornPattern = re.compile("\\${(.*?)}")
    unicornsInOutput = list(unicornPattern.finditer(output))

    enviormentVariablesInFile = {"data": data, "imageList": imageList, "value": None}

    for unicornMatch in reversed(unicornsInOutput):
        span = match.span()
        command = "value = " + unicornMatch.groups(1)[0]
        print(command)
        exec(command, {}, enviormentVariablesInFile)
        value = enviormentVariablesInFile["value"]
        if value is None:
            value = ""
        i += 1
        output = output[:span[0]] + value + output[span[1]:]

Now at least there is a name to the thing.

Also, I guess I get the idea. Your output starts as the html file and then you raw exec code in there that you delimited with ${} and insert it as you go.

Because you scan the whole document every time you end up with n^2 behaviour, but that is fine for your project.

But - there has to be a better way (and there is, even sans libraries) If you think of python as only good for you to write in a scripting way you should really be asking yourself: "how exactly would I do things differently in C#?".

In this case the main thing c# is going to prevent you from doing is evaluating arbitrary code since "eval"-ing c# is far less easy to do.

You even use this eval behaviour to define helper functions within your template

${None; getImageFiles = lambda l: list(map(lambda i: '<img class="SlideshowImage fade" src="' + str(i) + '" />', l))}

The key thing here that sucks - which is made more sucky but for a good reason by python requiring whitespace in syntax - is embedding code in templates.

Tools like JSP, java server pages, allow for basically arbitrary access to the context of the code around them.

This is why, even if you manage things well at the start, projects using things that are that permissive tend to get off the rails.

Down a step are things like Jinja (the templating engine flask uses). They allow for you to embed logic - with the conceit that it is sometimes required or helpful for formatting some html or similar , but they do not allow you arbitrary access to the outside scope.

This is what I think you are trying to accomplish with your code since you specify exactly the environment for exec and want to then write code to generate stuff using that environment.

This works except for the facts that

  1. Python is probably too powerful a language to be embedded in a template and
  2. Python is a bad fit syntactically for being embedded in html

The most simple kind of templating, and what I suggest you use for your project instead of what you are doing, is find and replace.

So instead of having your logic be in your template, you compute what you want to put outside of that context and jam it in after the fact without any logic.

So for your html that you want to generate - first things first - lets sub out the variable bits.

<html>
<head>

    <link href="index.css" rel="stylesheet" type="text/css" />
    <script src="index.js"></script>

</head>
<body>
    <div id="Container">
        <div id="ScheduleContainer">
            <table id="ScheduleTable">
                <tr class="ScheduleRow">
                    <th id="ScheduleHeader" colspan="5">
                        <h1>Schedule</h1>
                    </th>
                </tr>
                <tr class="ScheduleRow">
                    <th colspan="3">Name</th>
                    <th>Time</th>
                    <th>Room</th>
                </tr>
                <!--
                for scheadule in all_scheadules:
                     make a table row for that scheadule.
                -->
            </table>
        </div>

        <div id="SlideshowContainer">
            <!--
            for image in imageList:
                make an image on the page for that image
            -->
        </div>

        <div id="LogoBox">
            <img id="LogoImage" src="logo.png" />
        </div>

        <div id="Footer">
            <p style="display: inline" id="DateTimeTime" />
            <p style="display: inline" id="DateTimeDate" />
        </div>

    </div>
</body>
</html>

That's all you want to do. Now the question is "how do I fill in the comment blocks without barfing eval-able python code.

Your first option is the Jinja approach where your templating language has support for basic looping constructs. You can give it the info and it will format that on the html page.

But you are rolling your own, so we will go with the second approach - find and replace.

First, lets handle the rows

def schedule_row_html(row):
    return """
        <tr class="ScheduleRow">
            <td colspan="3" class="ScheduleName">{name}</td>
            <td class="ScheduleTime">{time}</td>
            <td class="ScheduleRoomNumber">${room_number}</td>
        </tr>""".format(name=row["name"], time=row["time"], room_number=row["room_number"])
all_schedule_html = "".join([ schedule_row_html(row)
                              for row in schedules ])

Now we are in a position where we can fill in the rows

pageHtml = """
<html>
<head>

    <link href="index.css" rel="stylesheet" type="text/css" />
    <script src="index.js"></script>

</head>
<body>
    <div id="Container">
        <div id="ScheduleContainer">
            <table id="ScheduleTable">
                <tr class="ScheduleRow">
                    <th id="ScheduleHeader" colspan="5">
                        <h1>Schedule</h1>
                    </th>
                </tr>
                <tr class="ScheduleRow">
                    <th colspan="3">Name</th>
                    <th>Time</th>
                    <th>Room</th>
                </tr>
                {schedules}
            </table>
        </div>

        <div id="SlideshowContainer">
            { images }
        </div>

        <div id="LogoBox">
            <img id="LogoImage" src="logo.png" />
        </div>

        <div id="Footer">
            <p style="display: inline" id="DateTimeTime" />
            <p style="display: inline" id="DateTimeDate" />
        </div>

    </div>
</body>
</html>
"""

pageHtml.format(schedules=all_schedule_html, images=TBD)
def images_html(images):
    "".join([ "<img class=\"SlideshowImage fade\" src={src}".format(src=imageUrl)
                      for imageUrl in images ])

(This is all pseudocode, so the finer points are up to you)

Now you may be asking yourself "but what if my template becomes more complicated?" "just doing string formatting can't scale!"

And to that I say - yeah no duh.

That's why people spent time writing, improving, and bug-fixing the existing templating libraries.

But if your requirements are as simple as you say - a single page regenerated every day or whatever - just do it inline with strings, who cares.

Also, tiny thing.

ChangeImage

The C#/.NET naming convention of every first letter being capitalized isn't used anywhere else. Most javaish people use the camelCase thing. Python supports that too, but the generally preferred style is snake_case.

Doesn't matter for this, but just keep it in the back of your head so when you finally have to code with other programmers you don't get bogged down in pointless holy wars

moving on from the exec thing finally:

    i = 0

    for event in events:
        start = event['start'].get('dateTime')
        if(start is None):
            continue
        start = start[11:16]
        hour = int(start[0:2])
        suffix = " AM"
        if hour > 12:
            suffix = " PM"
            hour %= 12
        start = str(hour) + start[2:5] + suffix
        name = event['summary']
        location = event['location']
        print(name, start, location, sep=", ")
        data[i] = (name, start, location)
        i += 1

What is this i?

It seems like you are just counting in step with the data because the way you coded it requires a set number of schedules on the page. Hopefully you know how to fix that now and you can just append to data (or whatever name you give it that actually represents what it is).

The larger problem with using i like this is that it increases the area you need to read over to understand a given chunk of code since you need to track reassignments and changes and uses of i everywhere from its first declaration to its last.

"i", while customary for simple kind of index based for loops from c-ish languages and sometimes when using range(...), really isn't a good enough name here.

        start = event['start'].get('dateTime')
        if(start is None):
            continue
        start = start[11:16]
        hour = int(start[0:2])
        suffix = " AM"
        if hour > 12:
            suffix = " PM"
            hour %= 12

Also, date handling logic is always going to be messy. I get that. But try and make it depend on less magic numbers.

Maybe isolate it to its own function (maybe, depending on if that helps or hurts readability in context)

What is start[11:16]? Maybe you know now, but god only knows a year from now.

getService you copy-pasted. No problem, but maybe put a link back to where you copy-pasted it from in case you need to change it later.

I would loop back around to tackling your misgivings about python but at this point im tired

Okay, sorry, I had to go halfway through this, and just got back

Thanks for all the help!

I'll try to fix some of the worse problems in this

You're right, I'm not respecting the language correctly

I wouldn't use the verb "Respecting" necessarily. You just need to learn how to write code to be read. I think being in python lowers some guardrails, so you are just bumping into stuff more.


<- Index

Can you explain malloc

Question from a Deleted User

could you explain what's happening here

(char *) malloc(n * sizeof(char));

The malloc thing allocates the structure on the heap.

Think of it like this:

Anything you declare like char[] without calling malloc is memory that is released as soon as the function returns. Any memory you get by using malloc is never released unless you explicitly call free on it.

I know it's confusing, but you'll figure it out.

and what does (char *) do here

The (char *) bit does explicitly casts the pointer in the eyes of the compiler. It is a pointer to the start of memory that contains god knows what.

If you allocated sizeof char then you would have a pointer to a block of memory that is the size of a single char, which would be equivalent to an array of characters of size 1.

does C have array index oob error? I remember it took me days to find out that I've made a mistake while looping the array

No and yes.

Yes that can cause your program to segfault or maybe crash in some way.

No you don't get a named error that tells you what you messed up.

If you want to get some output it might be educational to make your own "data structure" and printf or something if an error happens.

If you want C perf and sensible errors try Rust, but my Spidey Senses tell me you are still learning in general so that's not really a productive jump.


<- Index

What is the builder pattern for

Question from Abdul#4709

Stupid assignment requirement for using super method inside constructor

Just wanted to make sure prompt makes sense

System.out.print("Enter 'y' or 'n' if the triangle is filled: ");
char e = scanner.next().charAt(0);
boolean f = e == 'y' ? true : false;
Triangle triangle = new Triangle(a, b, c, d, f);

It's a small class so jus using var names like this

Once a constructor gets to 5 parameters it is probably time to switch to using the builder pattern.

It's boilerplate code in java, but it is a must for readability.

Can you share your triangle class as it is right now?

Never seen builder pattern before I will make sure to look it up

< CODE LOST TO TIME >

    protected Date dateCreated;

Thats...wierd. Why is this info stored?

I actually didn't make the Geometric object class it was posted by the prof

I made the other Triangle one

Man your prof is annoying

His lectures r even worse

Well, making do with what you have is...workable, but we can revisit that superclass to see how you would design it if you weren't a bored college professor.

Basically, for most uses the builder pattern is just a substitute for a language feature called named optional parameters

Consider this python

class Position:
    def __init__(self, *, x=0, y=0, z=0):
        self.x = x
        self.y = y
        self.z = z

What's the second parameter in that 🤔

It is a python shorthand saying "these things need to be named". That's not really the focus though.

With the constructor (init method - close enough for now) being written like this you can call it a bunch of different ways.

Position() # makes a 0, 0, 0
Position(x=1) # makes a 1, 0, 0
Position(z=4) # makes a 0, 0, 4
Position(y=2, z=4) # makes a 0, 2, 4
Position(x=1, y=2, z=4) # makes a 1, 2, 3

This has alot of cool benefits

Oh so you don't have to define different constructors many times unlike in java

For one, if a field has a sensible default value (like all of these do in the position case) you can just insert that if it isn't specified. Also, the parameters being named means that you can specify them "out of order" with the method/function definition, which is very important if you have more than 3 parameters.

"What does the 6th int mean" is a stupid question to have to ask yourself, not to mention the chance you get it wrong, so having the parameters named puts the name of the thing right next to the value.

And before I get to explaining the builder pattern (your way of hacking this language feature into java), try and consider how you would support the Position example with just overloading methods.

Aight

Perhaps having one method for just x and y and another one for x, y, z?

What if I want to specify just y and z?

That would require you to make another method

Which is repetition

And we just have three fields in a larger class there will be even more

Not only that, it wouldn't work.

Remember, if you have two methods with the same name (or constructors for that matter) then java needs to be able to tell them apart by the types of their arguments. So you can't have two constructors which both take two ints.

// As far as the compiler can tell, these are identical
Position(int x, int y) { ... }
Position(int y, int z) { ... }

You would have to start making static factory methods with different names for each case.

.positionYZ(...)
.positionXZ(...)

Ah shite true

And, while the default values piece of this is important, there will be times where mandating that the user name out all of the parameters, even if there are no optional parameters, makes it so your code is actually legible in the face of dozens of properties.

So without further ado - the builder pattern.

First, we will start with your triangle code

public class Triangle extends GeometricObject {
    private double a; // Side one.
    private double b; // Side two.
    private double c; // Side three.

    public Triangle() {
        this.setA(1.0);
        this.setB(1.0);
        this.setC(1.0);
    }

    public Triangle(double a, double b, double c, String d, boolean e) {
        super(d, e);
        this.setA(a);
        this.setB(b);
        this.setC(c);
    }

    public double getA() {
        return a;
    }

    public void setA(double a) {
        this.a = a;
    }

    public double getB() {
        return b;
    }

    public void setB(double b) {
        this.b = b;
    }

    public double getC() {
        return c;
    }

    public void setC(double c) {
        this.c = c;
    }

    public double getArea() {
        double p = (this.a + this.b + this.c) / 2;
        return Math.sqrt(p * (p - this.a) * (p - this.b) * (p - this.c));
    }

    public double getPerimeter() {
        return this.a + this.b + this.c;
    }

    @Override
    public String toString() {
        return "Triangle{" +
                "a=" + a +
                ", b=" + b +
                ", c=" + c +
                ", area=" + this.getArea() +
                ", perimeter=" + this.getPerimeter() +
                ", color='" + color + '\'' +
                ", filled=" + filled +
                '}';
    }
}

(kenndel#7506) Java is a beautiful language

public class Triangle extends GeometricObject {
    private double a; // Side one.
    private double b; // Side two.
    private double c; // Side three.

    public Triangle() {
        this.setA(1.0);
        this.setB(1.0);
        this.setC(1.0);
    }

    public Triangle(double a, double b, double c, String d, boolean e) {
        super(d, e);
        this.setA(a);
        this.setB(b);
        this.setC(c);
    }

    public double getA() {
        return a;
    }

    public double getB() {
        return b;
    }

    public double getC() {
        return c;
    }

    public double getArea() {
        double p = (this.a + this.b + this.c) / 2;
        return Math.sqrt(p * (p - this.a) * (p - this.b) * (p - this.c));
    }

    public double getPerimeter() {
        return this.a + this.b + this.c;
    }

    @Override
    public String toString() {
        return "Triangle{" +
                "a=" + a +
                ", b=" + b +
                ", c=" + c +
                ", area=" + this.getArea() +
                ", perimeter=" + this.getPerimeter() +
                ", color='" + color + '\'' +
                ", filled=" + filled +
                '}';
    }
}

Now lets give things more descriptive names and let's also get rid of that default constructor for now.

public class Triangle extends GeometricObject {
    private double sideA; // Side one.
    private double sideB; // Side two.
    private double sideC; // Side three.

    public Triangle(double sideA, double sideB, double sideC, String color, boolean filled) {
        super(color, filled);
        this.sideA = sideA;
        this.sideB = sideB;
        this.sideC = sideC;
    }

    public double getSideA() {
        return this.sideA;
    }

    public double getSideB() {
        return this.sideB;
    }

    public double getSideC() {
        return this.sideC;
    }

    public double getArea() {
        double p = (this.sideA + this.sideA + this.sideC) / 2;
        return Math.sqrt(p * (p - this.sideA) * (p - this.sideB) * (p - this.sideC));
    }

    public double getPerimeter() {
        return this.sideA + this.sideB + this.sideC;
    }

    @Override
    public String toString() {
        return "Triangle{" +
                "a=" + this.sideA +
                ", b=" + this.sideB +
                ", c=" + this.sideC +
                ", area=" + this.getArea() +
                ", perimeter=" + this.getPerimeter() +
                ", color='" + color + '\'' +
                ", filled=" + filled +
                '}';
    }
}

And for now on, I am going to leave out all of the methods for space

The first thing we want to do is make the constructor private, since we are going to be replacing the access pattern of "calling the constructor with all the arguments" with our builder.

public class Triangle extends GeometricObject {
    private double sideA; // Side one.
    private double sideB; // Side two.
    private double sideC; // Side three.

    private Triangle(double sideA, double sideB, double sideC, String color, boolean filled) {
        super(color, filled);
        this.sideA = sideA;
        this.sideB = sideB;
        this.sideC = sideC;
    }
}

Never seen a private constructor before

A private constructor can only be called from the class it is defined and stuff within that class, So lets make the stuff that goes in the class.

public class Triangle extends GeometricObject {
    private double sideA; // Side one.
    private double sideB; // Side two.
    private double sideC; // Side three.

    private Triangle(double sideA, double sideB, double sideC, String color, boolean filled) {
        super(color, filled);
        this.sideA = sideA;
        this.sideB = sideB;
        this.sideC = sideC;
    }

    public static class Builder {}
}

We add a builder class within the triangle class. This builder can access all of the private methods of the Triangle class because it is in the Triangle class.

Now this builder needs to keep track of all of the information needed to build a triangle. We also want that information to be added one piece at a time - one method call at a time.

Does it matter if Builder class is static or non static

Yep. Thats a confusing java specific thing - feel free to google why that is needed.

public class Triangle extends GeometricObject {
    private double sideA; // Side one.
    private double sideB; // Side two.
    private double sideC; // Side three.

    private Triangle(double sideA, double sideB, double sideC, String color, boolean filled) {
        super(color, filled);
        this.sideA = sideA;
        this.sideB = sideB;
        this.sideC = sideC;
    }

    public static class Builder {
        private double sideA;
        private double sideB;
        private double sideC;
        private String color;
        private boolean filled;

        public Builder() {
                // Put any defaults here. If there isn't any default set it to null and check for that later
               this.sideA = 1.0;
               this.sideB = 1.0;
               this.sideC = 1.0;
               this.color = "black";
               this.filled = true;
        }

        public void setSideA(double sideA) { this.sideA = sideA; }
        public void setSideB(double sideB) { this.sideB = sideB; }
        public void setSideC(double sideC) { this.sideC = sideC; }
        public void setColor(String color) { this.color = color; }
        public void setFilled(boolean filled) { this.filled = filled; }
    }
}

Now you have the ability to mutate the builder for whatever you want.

Wait, you have to redeclare the fields inside builder class?

usually, yeah. It is far from a perfect system.

It is bad code to write, but it provides the nicest possible outward facing interface.

So when you're mutating the fields inside builder does it update them in the main class also or just the builder class

Just the builder class. No instance of the outer class exists yet.

public class Triangle extends GeometricObject {
    private double sideA; // Side one.
    private double sideB; // Side two.
    private double sideC; // Side three.

    private Triangle(double sideA, double sideB, double sideC, String color, boolean filled) {
        super(color, filled);
        this.sideA = sideA;
        this.sideB = sideB;
        this.sideC = sideC;
    }

    public static class Builder {
        private double sideA;
        private double sideB;
        private double sideC;
        private String color;
        private boolean filled;

        public Builder() {
           // Put any defaults here. If there isn't any default set it to null and check for that later
            this.sideA = 1.0;
            this.sideB = 1.0;
            this.sideC = 1.0;
            this.color = "black";
            this.filled = true;
        }

        public void setSideA(double sideA) {
            this.sideA = sideA; 
        }

        public void setSideB(double sideB) { 
             this.sideB = sideB;
        }

        public void setSideC(double sideC) { 
             this.sideC = sideC;
        }

        public void setSideB(String color) { 
             this.color = color;
        }

        public void setFilled(boolean filled) { 
             this.filled = filled;
        }
    }
}

No change yet, I just cleaned up the above code.

So now that we can set the properties on the builder we have to add a method for actually constructing the Triangle.

public class Triangle extends GeometricObject {
    private double sideA; // Side one.
    private double sideB; // Side two.
    private double sideC; // Side three.

    private Triangle(double sideA, double sideB, double sideC, String color, boolean filled) {
        super(color, filled);
        this.sideA = sideA;
        this.sideB = sideB;
        this.sideC = sideC;
    }

    public static class Builder {
        private double sideA;
        private double sideB;
        private double sideC;
        private String color;
        private boolean filled;

        public Builder() {
           // Put any defaults here. If there isn't any default set it to null and check for that later
            this.sideA = 1.0;
            this.sideB = 1.0;
            this.sideC = 1.0;
            this.color = "black";
            this.filled = true;
        }

        public void setSideA(double sideA) {
            this.sideA = sideA; 
        }

        public void setSideB(double sideB) { 
             this.sideB = sideB;
        }

        public void setSideC(double sideC) { 
             this.sideC = sideC;
        }

        public void setColor(String color) { 
             this.color = color;
        }

        public void setFilled(boolean filled) { 
             this.filled = filled;
        }

        public Triangle build() {
             // You still need to remember the order, but it is only in one place at least
            // This is also the place you should put any validations. Checking if any required properties are not set, things are
           // set as null that should not be null, etc.
            return new Triangle(this.sideA, this.sideB, this.sideC, this.color, this.filled);
        }
    }
}

now, as it is written now, you would end up using the builder like this

Triangle.Builder builder = new Triangle.Builder();
builder.setSideA(...);
builder.setSideB(...);
builder.setSideC(...);
builder.setColor(...);
builder.setFilled(...);
Triangle triangle = builder.build();

One way to make it a bit easier to use the builder is to make each setter return a reference to the builder itself.

public class Triangle extends GeometricObject {
    private double sideA; // Side one.
    private double sideB; // Side two.
    private double sideC; // Side three.

    private Triangle(double sideA, double sideB, double sideC, String color, boolean filled) {
        super(color, filled);
        this.sideA = sideA;
        this.sideB = sideB;
        this.sideC = sideC;
    }

    public static class Builder {
        private double sideA;
        private double sideB;
        private double sideC;
        private String color;
        private boolean filled;

        public Builder() {
           // Put any defaults here. If there isn't any default set it to null and check for that later
            this.sideA = 1.0;
            this.sideB = 1.0;
            this.sideC = 1.0;
            this.color = "black";
            this.filled = true;
        }

        public Builder setSideA(double sideA) {
            this.sideA = sideA; 
             return this;
        }

        public Builder setSideB(double sideB) { 
             this.sideB = sideB;
             return this;
        }

        public Builder setSideC(double sideC) { 
             this.sideC = sideC;
             return this;
        }

        public Builder setColor(String color) { 
             this.color = color;
             return this;
        }

        public Builder setFilled(boolean filled) { 
             this.filled = filled;
             return this;
        }

        public Triangle build() {
            return new Triangle(this.sideA, this.sideB, this.sideC, this.color, this.filled);
        }
    }
}

This lets you "chain" the method calls like this

Triangle triangle = new Triangle.Builder()
    .setSideA(...)
    .setSideB(...)
    .setSideC(...)
    .setColor(...)
    .setFilled(...)
    .build();

So that is the basic "pattern". From here on out its kinda all preference and style.

Personally, I don't like the setThing naming with builders, so I usually choose to just use the field name. In a builder its kinda understood that you are setting things.

public class Triangle extends GeometricObject {
    private double sideA; // Side one.
    private double sideB; // Side two.
    private double sideC; // Side three.

    private Triangle(double sideA, double sideB, double sideC, String color, boolean filled) {
        super(color, filled);
        this.sideA = sideA;
        this.sideB = sideB;
        this.sideC = sideC;
    }

    public static class Builder {
        private double sideA;
        private double sideB;
        private double sideC;
        private String color;
        private boolean filled;

        public Builder() {
           // Put any defaults here. If there isn't any default set it to null and check for that later
            this.sideA = 1.0;
            this.sideB = 1.0;
            this.sideC = 1.0;
            this.color = "black";
            this.filled = true;
        }

        public Builder sideA(double sideA) {
            this.sideA = sideA; 
             return this;
        }

        public Builder sideB(double sideB) { 
             this.sideB = sideB;
             return this;
        }

        public Builder sideC(double sideC) { 
             this.sideC = sideC;
             return this;
        }

        public Builder color(String color) { 
             this.color = color;
             return this;
        }

        public Builder filled(boolean filled) { 
             this.filled = filled;
             return this;
        }

        public Triangle build() {
            return new Triangle(this.sideA, this.sideB, this.sideC, this.color, this.filled);
        }
    }
}
Triangle triangle = new Triangle.Builder()
    .sideA(...)
    .sideB(...)
    .sideC(...)
    .color(...)
    .filled(...)
    .build();

The other thing that is useful is to make the actual builder not constructable but instead give out an instance via a static method.

public class Triangle extends GeometricObject {
    private double sideA; // Side one.
    private double sideB; // Side two.
    private double sideC; // Side three.

    private Triangle(double sideA, double sideB, double sideC, String color, boolean filled) {
        super(color, filled);
        this.sideA = sideA;
        this.sideB = sideB;
        this.sideC = sideC;
    }

    // You get a new builder via Triangle.builder()
    public static Builder builder() {
        return new Builder();
    }

    public static class Builder {
        private double sideA;
        private double sideB;
        private double sideC;
        private String color;
        private boolean filled;

        // And now this is private so your users don't construct it directly
        private Builder() {
            this.sideA = 1.0;
            this.sideB = 1.0;
            this.sideC = 1.0;
            this.color = "black";
            this.filled = true;
        }

        public Builder sideA(double sideA) {
            this.sideA = sideA; 
             return this;
        }

        public Builder sideB(double sideB) { 
             this.sideB = sideB;
             return this;
        }

        public Builder sideC(double sideC) { 
             this.sideC = sideC;
             return this;
        }

        public Builder color(String color) { 
             this.color = color;
             return this;
        }

        public Builder filled(boolean filled) { 
             this.filled = filled;
             return this;
        }

        public Triangle build() {
            return new Triangle(this.sideA, this.sideB, this.sideC, this.color, this.filled);
        }
    }
}
Triangle triangle = Triangle.builder()
    .sideA(...)
    .sideB(...)
    .sideC(...)
    .color(...)
    .filled(...)
    .build();

I don't have a real rock solid argument for the .builder() static method other than it looks better, but I think I might have in the past.


<- Index

How to print every item in a list

How can I get items inside a list to print into the terminal individually in Python

List = ["cat", "dog", "snake"]

I want them to print to the terminal on a separate line like

Cat

Dog

Snake

Oh I googled it

I did google it before but I didn't phrase it correctly

for x in list:
    print(x)

I will save you some time in the future then for other data structures

# list
x = [1, 2, 3]
for thing in x:
    print(thing)

# dictionary
x = { "a": "apple", "b": "banana" }

## For everything here, don't rely on the order you get them in being the same always

for key in x:
    print(key) # will give you "a" and "b"

for key in x.keys():
    print(key) # identical behaviour to the above, but a bit more explicit

for value in x.values():
    print(value) # Will give you "apple" and "banana"

for key, value in x.items():
    print(key) # Will give you the pairs ("a", "apple") and ("b", "banana")
    print(value)

"a" in x # Will evaluate to True
"apple" in x # Will evaluate to False. "in" with a dictionary checks keys.

# sets
x = { 1, 2, 3 }

for thing in x:
    print(thing) # Will give you 1, 2, and 3, but just like dicts dont rely on the order

## A set is the data structure you should use if you want a list of things where it wouldn't matter
## how many times something is in the list and you don't care about the ordering of those things

1 in x # Will evaluate to True
4 in x # Will evaluate to False

## Small note, make an empty set by calling set(), not by using empty braces {} - That will create an empty dictionary

<- Index

Bad reasons to avoid NodeJS

by: Ethan McCue

Conversation with diamondburned#4507

you shouldn't ever use nodejs anyway

dependencies

slow performance

shit language was never designed for backend anywa

3.00000000000000000000000003

shit libraries

bloated

each line is a reason btw

even when not activating, it's still slower than compiled languages

Speed is a nonsensical reason to not use node. JS vms are pretty fast and the kind of applications that use node are mostly IO bound, which V8's async system is pretty good at.

It is for me when it comes to restarting a service for maintenance reason or for complex code

Thats not how you should be running servers to be honest.

"I won't get downtime because I can restart a server ultra fast" really isn't a sensible plan, even if your server is the fastest thing in the world written in rust

Either you architect your devops to allow zero downtime (multiple nodes, cycling restarts, multi DC maybe) or you live with having to do maintenance on/off hours.

Startup time of node has no impact on that.

As always basically everything has an exception at Google scale and it's not like there aren't valid potshots to take at Node and Javascript, but as a platform it really does excel at IO heavy tasks.

wdym heavy IO tasks?

Well, for example, the majority of the time a webserver spends handling a request is spent in IO. Waiting for a database to respond, sending data to the user, receiving a request.

Maybe it's a websocket thing and you receive from one user and broadcast to others. Either way, the time spent handling that almost always dominates the amount of time spent actually running code in the js thread.

The model of "okay, here is this IO task - give it to another thread and tell me what the results are when it's done" works out pretty well performance wise.

Also, the cold startup for node is only really bad if you compare it to Rust, C, and C++ which isn't really fair.

https://medium.com/@nathan.malishev/lambda-cold-starts-language-comparison-%EF%B8%8F-a4f4b5f16a62

C and C++ require pretty huge tradeoffs in terms of developer speed and possibility of errors, so they really aren't targeted at the same users as node and rust - well rust is a lot better but being super statically typed still has tradeoffs with how quick you can change things which means it won't be the right choice for every "webserver" use case.

okay, here is this IO task - give it to another thread and tell me what the results are when it's done

so bringing go into the equation, go routine workers

Yeah I think go's async model is better too.

But go isn't that monumentally faster than node that you can say "forget node it's too slow" because it really isn't.

Neither is python for that matter. Though python's async and package management story are a lot worse , it is still the reigning king of glue languages.

Some critical path too slow?

Rewrite it in Rust, join the meme.

Python will keep on chugging.

I mean, the GIL stinks, but it's still not "painfully" slow.


<- Index

How to display execution times

Question from Abdul#4709

Stuck on LinkedLists

Anyone knows how to display execution times

Probably out of scope for your assignment, but you can always use something like this

private void timeProcedure(Runnable procedure) {
    long start = System.currentTimeMillis();
    procedure.run();
    long end = System.currentTimeMillis();
    long duration = end - start;
    System.out.println(duration + " milliseconds.");
}

and then you can do something like

timeProcedure(() -> linkedList.get(1000000 / 2);

You can also add the name to have better debugging

private void timeProcedure(String procedureName, Runnable procedure) {
    long start = System.currentTimeMillis();
    procedure.run();
    long end = System.currentTimeMillis();
    long duration = end - start;
    System.out.println(procedureName + " took " + duration + " milliseconds.");
}
timeProcedure("Accessing an element in the middle of a linked list", () -> linkedList.get(100000 / 2))

which would output something like

Accessing an element in the middle of a linked list took 33 milliseconds

<- Index

MVC Origin

by: Ethan McCue

So, tiny history lesson. The phrase and acronym "MVC" came out of this:

http://heim.ifi.uio.no/~trygver/themes/mvc/mvc-index.html

I don't think that's the original memo, but it was just one person in the late 70s who hypothesised that it would be a good model.

And it turns out that yes, separating your concerns like this does in fact make your code easier to write and maintain generally.

but for a web server beyond a small project I think you will find that it all doesn't fit so cleanly into exactly 3 buckets M, V, and C.

The django project uses MV* and I think that gets to the heart of it.

Yes, there is a part of your code involved with storing and modeling your data and yes, there is a part of your code involved with displaying a view but beyond that its... fuzzy.


<- Index

Is it okay to have a class with just methods

Question from davidv7#2315

would it be ok to just have a class with methods, no class parameters?

aka

class Thing{
    public Thing(){
        
    }
method1(args..)
method2(args...)
}

in java

So specifically if you want to do that you have a few considerations.

Java kinda obscures this, but first consider "Does this method perform a side effect" meaning, does it read from a db, write to a db, print something out, load a file, etc.

If the answer is NO then you can make a "Utils" class

class Utils {
    private Utils() {} // We don't need to construct this class so just disallow it

    // This is a pure function so it is perfectly fine to put it into a helper method like this
    public static absoluteValuePlusOne(int num) {
        return Math.abs(num) + 1;
    }
}

If the answer is YES then there is some value in a class with no parameters. Namely, that class can implement an interface.

Lets say this was your behaviour

MyDataSource dataSource = new MyDataSource();
List<OrderItem> items = dataSource.getItems();

If you wanted to make that behaviour customizable you can put that in an interface.

(IF being a word I want you to notice: When you "abstract" you make code a little harder to understand so always consider if you actually want that)

public interface DataSource {
    // Does a thing
    List<OrderItem> getItems();
}

And use a class with an empty constructor to implement it

public class MyDataSource implements DataSource {
    public MyDataSource() {}
    public List<OrderItem> getItems() {
        // ... Some implementation goes here ...
    }
}

Which provides you the benefit of being able to swap in how that behaviour is done later on or in different places in your code

DataSource dataSource = new MyDataSource(); // Can be any implementation
List<OrderItem> items = dataSource.getItems();
public someMethodInSomeClass(DataSource dataSource, String userId) {
    // ... can do some hoopla and not have to be changed if your method of retrieving data changes
    // (the example is kinda contrived, I know, but basically every programming example is)
}

Using an interface in key places also has implications for how you test your code, but I won't get into that now.


<- Index

How do you make a fixed size circular buffer

Question from JohnDoe#9991

How would you have an object with 2 values that'd work like this

>(0,1)

>add(2)

(1,2)

>add(3)

(2,3)

>add(10)

(3,10)

>add(50)

(10, 50)

etc

ie replacing the oldest value

An ordered array with unlimited size and removing the oldest value when something is added would do but that'd be ugly

In general you can do that with a "circular buffer"

Here you go.

class CircularBuffer:
    def __init__(self, size=2):
        self._buffer = []
        self._size = size
        self._index = 0
    
    def add(self, item):
        # This follows your "add"
        if len(self._buffer) < self._size:
            self._buffer.append(item)
        else:
            self._buffer[self._index] = item
        
        self._index = (self._index + 1) % self._size
        
    def __len__(self):
        return len(self._buffer)
        
    def __iter__(self):
        for item in self._buffer:
            yield item
            
    def __getitem__(self, key):
        return self._buffer[key]
        
    def __repr__(self):
        return f"CircularBuffer (buffer={self._buffer}, size={self._size}, index={self._index})" 
        
        
c = CircularBuffer(size=2)
print(c)
c.add(0)
print(c)
c.add(1)
print(c)
c.add(2)
print(c)
c.add(3)
print(c)
c.add(10)
print(c)
c.add(50)
print(c)

print(c[0])
print(c[1])
print(len(c))
print(list(c))
CircularBuffer (buffer=[], size=2, index=0)
CircularBuffer (buffer=[0], size=2, index=1)
CircularBuffer (buffer=[0, 1], size=2, index=0)
CircularBuffer (buffer=[2, 1], size=2, index=1)
CircularBuffer (buffer=[2, 3], size=2, index=0)
CircularBuffer (buffer=[10, 3], size=2, index=1)
CircularBuffer (buffer=[10, 50], size=2, index=0)
10
50
2
[10, 50]

This is a mutable implementation. If you want an immutable implementation I can write that up as well, but it will take a bit more time.

The key is the modulo operator.

We track the "last inserted element" by always storing it to the right of the last element we stored, meaning if I last inserted something at index 0, the next thing i will insert at index 1.

If I reach the "end" of the list or array then I need to "circle" around back to the first element at index 0.

To keep track of this information we remember an integer representing the last index we inserted at, then each time we add something to the list we increment the integer to "move to the right" and to make it "loop around" we use the modulo operator.

"modulo" just means the remainder you would get when dividing two numbers. For example

15 % 4 is 3

This is because we can fit three 4s into 15 (4 * 3 = 12), but we don't have enough to fit another four so we have a "remainder" of 3 (15 - 12 = 3)

When you repeatedly do n = (n + 1) % some_limit then the value of n will go up to 1 minus the limit before looping back around to zero.

Try these on paper to get a better idea of why

0 % 3 = ?
1 % 3 = ?
2 % 3 = ?
3 % 3 = ?
4 % 3 = ?
5 % 3 = ?
6 % 3 = ?
7 % 3 = ?

For your specific case of just 2 items the modulo stuff is kinda overkill (from a cognitive load standpoint) so just do whatever solves your problem the simplest.

But if you wanted an object that holds max n elements that works like you described (where old elements are replaced in place) this is probably the cleanest way to do that.

If the "where" in the list doesn't matter then maybe consider using a deque with a size cap


<- Index