Python Only Has One Real Competitor

by: Ethan McCue

Clickbait subtitle: "and it's not even close"


Python is the undisputed monarch in exactly one domain: Data Science.

The story, as I understand it, goes something like this:

Python has very straight-forward interop with native code. Because interop is straightforward, libraries like numpy and pandas got made. Around these an entire Data Science ecosystem bloomed.

This in turn gave rise to interactive notebooks with iPython (now "Jupyter"), plotting with Matplotlib, and machine learning with PyTorch and company.

There are other languages and platforms - like R, MATLAB, etc. - which compete for users with Python. If you have a field biologist out there wading through the muck to measure turtles, they will probably make their charts and papers with whatever they learned in school.

But the one glaring weakness of these competitors is that they are not general purpose languages. Python is. This means that Python is also widely used for other kinds of programs - such as HTTP servers.

So for the people who are training machine learning models to do things like classify spam it is easiest to then serve those models using the same language and libraries that were used to produce them. This can be done without much risk because you can assume Python will have things like, say, a Kafka library should you need it. You can't really say the same for MATLAB.

And in the rare circumstance you want to do something that is easiest in one of those alternative ecosystems, Python will have a binding. You can call R with rpy2, MATLAB with another library, Spark with PySpark, and so on.

For something to legitimately be a competitor to Python I think it needs to do two things.

  1. Be at least as good at everything as Python.
  2. Be better than Python in a way that matters.

The only language which clears this bar is called Clojure.

Clojure Language Logo


That's a bold claim, I know. Hear me out.

Clojure has a rich ecosystem of Data Science libraries, including feature complete numpy and pandas equivalents in dtype-next and tech.ml.dataset. metamorph.ml for Machine Learning pipelines, Tableplot for plotting, and Clay for interactive notebooks.

For what isn't covered it can call Python directly via libpython-clj, R via ClojisR, and so on.

Clojure is also a general purpose language. Making HTTP servers or whatever else in Clojure is very practical.

What starts to give Clojure the edge is also the answer to the age-old question: "Why is Python so slow?"

Python is slow because it cannot be made fast. The dark side of Python's easy interop with native code is that many of the implementation details of CPython were made visible to, and relied upon by, authors of native bindings.

Because all these details were relied upon, the authors of the CPython runtime can't really change those details and not break the entire Data Science ecosystem. This heavily constrains the optimizations that the CPython runtime can do.

This means that people need to constantly avoid writing CPU intensive code in Python. It is orders of magnitude faster to use something which delegates to the native world than something written in pure Python. This affects the experience of things like numpy and pandas. There is often a "fast way" to do something and several "slow ways." The slow ways are always when too much actual Python code gets involved in the work.

Clojure does not have this problem. Clojure is a language that runs on the Java Virtual Machine. The JVM can optimize code like crazy on account of all the souls sacrificed to it. So you can write real logic in Clojure no issue.

There's a reason Python's list is implemented in C code but Java can have multiple competing implementations, all written in Java. Java code can count on some aggressive runtime optimizations when it matters.

This also means that if you use Clojure for something like an HTTP Server to serve a model, you can generally expect much better performance at scale than the equivalent in Python. You could even write that part in pure Java to make use of that trained pool of developers. Anecdotally, startups often switch from whatever language they started with to something that runs on the JVM once they get big enough to care about performance.

Clojure's library ecosystem includes many high quality libraries written in Java. Many of these are better performing than their Python analogues. Many also do things for which Python has no equivalent. Clojure then gets access to all Python libraries via libpython-clj.

Clojure's interop story is also quite strong at the language level. Calling a Python function is almost as little friction linguistically as calling a Clojure function. Calling native code with coffi is also pretty darn simple.

The language is also very small even compared to Python. Obviously the education system infrastructure is not in place, but in principle there is less to learn about the language itself before one can productively learn how to do Data Science.

An extremely important part of productive Data Science work is interacting with a dataset. This is why interactive notebooks are such a big part of this world. It's also a benefit of using dynamic languages like Python and Clojure. Being able to run quick experiments and poke at data is more important than static type information.

Clojure is part of a family of languages with a unique method of interactive development. This method is considered by its fans to be superior to the cell-based notebooks that Jupyter provides.

All in all, it's a competitive package. Whether it ever gets big enough to take a big bite of Python comes down to kismet, but I think it's the only thing that might stand a chance to.


If this got you interested in learning Clojure check out Clojure Camp for resources and noj for a cohesive introduction to the Data Science ecosystem.


<- Index