Issues you will face binding to C from Java.

by: Ethan McCue

Specifically using the Foreign Function and Memory API.

I've written about some of these issues before, but now I have a project I can use as an example.

Stick around to the end for an open challenge to the audience.

Context

Yu-Gi-Oh is a children's card game from Japan. Each player has "life points" and the goal of the game is to use your monsters, spells, and traps to reduce your opponent's life points to zero.

This game is how I first learned to read.

My Father likes playing a Yu-Gi-Oh game from the early 2000s called "Power of Chaos: Yugi the Destiny," named after the character in the anime who solves a puzzle and gets possessed by the spirit of an ancient Egyptian Pharaoh.

Since he figured out it supports multiplayer and can let me play against him, he has also started playing "Power of Chaos: Joey the Passion." This is also from the early 2000s and is named after the character in the anime who is a Japanese high school student that for some reason speaks with a Brooklyn accent.

Every now and then a character will die because they lost this children's card game. This got censored as them being "sent to the shadow realm," a place of eternal torture where their soul will never know peace. Much more kid friendly.

In my opinion the "Power of Chaos" series of games have by far the best visual, audio, and interaction design of any Yu-Gi-Oh game. This includes unofficial fan projects.

So something I threw on my large list of back-burner projects is to make my own Yu-Gi-Oh game that closely emulates the design of those early games.

Unfortunately the actual card game is crazy complicated. Every card is basically its own tiny program and there are decades of special case rulings to implement. It is well beyond the scope of what I am capable of as an individual.

Fortunately there is both a crowdsourced repository of lua scripts for all the cards and a community maintained engine for simulating the rules of the game. This engine has a relatively minimal C API and is, I think, a perfect example of the kind of thing that should have bindings written for it. The depth of expertise that went into it is unrepeatable.

Problem 1: C libraries aren't always pure C

Despite having a C API some of the headers of this particular library made use of C++ features. This is often ignorable in the C/C++ world since the major C compilers also support C++. It is not ignorable when you use jextract to generate bindings. jextract only supports C.

You could interpret this as a one-off, but my suspicion is that it's a problem that will naturally recur for libraries written in C++ that expose a C API. Especially if they are never tested in this context.

The solutions are to either make issues/open PRs or to enhance jextract to be able to support a limited subset of C++. Despite the words "limited subset" the latter is likely a nightmare pit of a task, so you should be ready to do the first.

Problem 2: You need different Java code per-platform

jextract takes as input a C header file and spits out a folder of Java classes. These classes include information about available functions as well as the memory layouts of defined structs. Unfortunately C is a protocol where memory layouts can change based on both the target operating system and the underlying CPU architecture.

This means you need different Java code for macos-aarch64 than you do for windows-x86. Pop-quiz: how do you do this in Maven? In Gradle?

You also need access to the target platform to generate this code. jextract doesn't have the zig cc magic that lets it compile for any platform from any platform. So you are left with having to use services like Github Actions.

In the absolute best case scenario you can just swap out a source set when compiling. OCG_CardData might have a platform specific layout, but the generated methods and struct members will be the same.

If you want to have one library support all platforms you'll need to contrive a system for selecting methods to call at runtime. This can lead to some pointless duplication.

Problem 3: C libraries are often special snowflakes

Question: what is the Maven Central for C?

Answer: ha. hahaha. ha. ha. ha. ha.

While there are things like Conan and vcpkg out in the world it's far from a given that any particular library you'd want is on them.

Its likely you'll end up either adding a GitHub repo as a submodule or writing a script to download the repo or some random .zip/.tar.gz file.

From there every C library has its own build instructions and flags you may or may not need to pass.

Pop-quiz: how do you do this in Maven? In Gradle?

Problem 4: You need to figure out packaging

From here we'll assume that the C library you are binding is going into a library. Either to be shared with the wider world or to be one module in a larger project tree.

Firstly, you probably need to make one artifact per target platform. In Maven repositories the way people share such artifacts is with a classifier scheme. Each artifact should be published with a -macos-aarch64 or similar classifier and modules depending on these libraries need to select the right one.

Then for the actual .dll/.so/.dylib that is the compiled artifact for the C library you have a few options.

  1. Expect the user to just have it on their machine already or install it separately.
  2. Expect the user to provide it manually with -Dsystem.library.path.
  3. Package that file into a jar and extract it dynamically at runtime.
  4. Package your code as a .jmod and expect that to be usable.

Option 1 sucks for obvious reasons.

Option 2 is just option 1 but without it needing to be a global system-level thing. Tools like Maven are, best I can tell, built around the concept that a "Set of Dependencies" should become a --class-path. You can't really have C shared libraries as automatically resolved dependencies.

Option 3 is what most libraries do today. LWJGL has "natives" jars which contain the actual shared libraries. It then uses a shim to extract the shared libraries to the filesystem. This is annoying firstly because its finicky code. Extracting a file atomically to the filesystem is a whole thing. It also requires that you are deploying code in a context where you have a writable filesystem. This is not always the case.

But most importantly it annoys me because it is so clearly a forced move. If you could actually just put the shared library on the -Dsystem.library.path automatically in the same way other dependencies are automatically put on the --class-path I don't think people would be doing it. Well, that and if people weren't universally using uberjars as their deployment mechanism.

Option 4 I think has potential. It's what I've done for the Yu-Gi-Oh bindings. For those unaware, .jmod files have delineated locations for classes, legal metadata, configuration files, and shared libraries. This is how the JDK bundles the shared libraries needed for making things like Swing work.

One problem is that .jmods aren't usable at runtime. You need to link them together with jlink to produce a JDK then use that JDK. You can technically extract the contents of a .jmod and put the extracted classes folder on the --module-path and libs on the -Dsystem.library.path.

Pop-quiz: how do you do that in Maven? In Gradle?

Problem 5: Nobody uses the --module-path

In discussions about the --module-path the focus tends to be on one of three aspects.

  1. I heard a friend of a cousin say it broke a library nearly a decade ago in Java 9.
  2. It was needed to turn off deep reflection for JDK internals.
  3. Wow, tooling does not support this well at all.

The 3rd aspect is most relevant here because you can't really expect anyone using your library to have put it on the --module-path. This is unfortunate because modules are the only mechanism to group up and hide whole packages from external consumers.

jextract will generate a large number of public classes which contain calls to fundamentally unsafe APIs. Not only that, unless you go through and manually allowlist functions and symbols with --include-function et al. it will contain a lot of stray functions you may or may not be using in your actual binding code.

What you would ideally want is to just not export those autogenerated classes and only export the code you wrote that wraps them up. This would be needed for someone writing --enable-native-access=your.specific.module to mean "I have audited or otherwise trust the producer of this module to have interacted with the native world in a way that won't crash the JVM."

This is in contrast to --enable-native-access=ALL-UNNAMED and "yeah whatever, just let me use native stuff. Who cares where it comes from."

Even if you don't particularly care about that story, autogenerated bindings don't make for the world's best public API.

Challenges

So with that prelude, here is the challenge for the audience.

  1. Using your build tool of choice (Maven, Gradle, bld, Mill, etc.) replace the Justfile I have to build the project. I mean replace, not do a different or worse job. Produce as a final artifact a .jmod, procure the C library, run jextract, include the proper legal metadata, etc.

  2. Put it in a local maven repository and use it from a Maven project. Good luck.

  3. Bonus points, actually work on the unfinished binding code. Would be appreciated, but it's also my burden to bear and not super relevant to the rest of this post.


<- Index