Extending Python with Rust
Sometimes a pure Python script can't just deliver the performance we need. When that's the case we have to resort to writing our logic in a "fast" compiled language like C or Rust and expose the function with through a Python module. This way we get the best of both worlds. Today I focus on how to use Rust for writing such extensions. I choose Rust over C because it is just nicer to use and less of a minefield of gotchas waiting for you trip them. Also, since as a data scientist I spend spend most of the time manipulating Numpy arrays so I will focus on how to pass them and return them from Rust. To accomplish this I'll make use of the PyO3 and Numpy crates.
The code has been borrowed from the
rust-numpy
examples and it is just to showcase how to write Rust extensions.
Setup
In order to get through the following steps you will need to have the Rust toolchain and pyenv installed.
Let's start by creating a virtual environment with all the necessary dependencies and a new Rust library. Our Python dependencies are:
- Numpy
- Maturin: To help with the building process of the Rust library.
cargo new --lib Rumpy
# Prep virtualenv, Python must be >=3.6
pyenv virtualenv 3.8.5 Rumpy
pyenv activate Rumpy
python -m pip install --upgrade pip
pip install numpy maturin
Next, to configure our Rust project we update the Cargo.toml
file with the
followig dependencies. The name for lib
has to match both what we will define
in the Rust code and the name used in import
clauses from Python.
[lib]
name = "rust_ext"
crate-type = ["cdylib"]
[dependencies]
numpy = "0.13"
ndarray = "0.14"
[dependencies.pyo3]
version = "0.13"
features = ["extension-module"]
With all that ready lets have a look at the actual code.
The library
The library will provide two simple examples. The first one, axpy
, multiplies
an array by a scalar value and adds it to a second array. Our other function,
mult
, just multiplies an array by a scalar.
First we annotate the function that will ultimately represent the Python module
with the #[pymodule]
annotation. This function must takes _py
which shows
that we're holding the GILboth and the module itself. This macro takes of exporting
the initialization function of the module.
When defining functions we will first define the logic that carries out the
function logic, in this case axpy
and mult
, together with wrapper functions
,axpy_py
and mult_py
. The wrapper functions which eventually get exported
must be annotated as #[pyfn(m, "axpy")]
. The first argument of the annotation
is the Python module that was passed to the "module" function and the second
one, the name that the exported function will take. This will register the
functions to the module. More details on the details of the PyO3 annotations
can be found on its documentation.
Compilation is as simple as running:
maturin develop --release
This will take care of compiling the module with optimizations turned on and install it on your env so you can immediately test it.
Benchmarks
Finally, lets do some simple benchmarks to see how well does the Rust
implementation compare against both the natural Numpy solution and a naive
Python implementation. We are just interested on a quick check so the IPython
%%timeit
magic is enough here. The IPython session would look something like
this.
Quick snapshot of the IPython session used for benchmarking.
Rumpy Vs Numpy benchmark.
As expected the pure Python implementation is comically slow and won't be considered futher. What's more interesting is that the Rust implementation is just a factor of 1.23 slower (for large arrays) than just using Numpy. Using PyO3 apparently introduces zero overhead and for smaller inputs the Rust implementation was actually marginally faster than Numpy. In exchange for a slight loss in performance we get code that reads exactly as the Numpy implementation and with stronger guarantees about correctness than if we had written a C algorithm using CFFI.
Of course, you would never go down the route of writing a compiled extension when the algorithm can be expressed so simply using vectorized Numpy operations. However, when writing more complex logic and algorithms that can't be simply expressed with Numpy ops I am willing to take the tradeoff a small (relatively speaking) overhead in exchange for a modern programming language which is both nicer to use and more secure than C when writing Python extensions.
Farewells
On future blog posts I will explore other alternatives to accelerate Python code such as: cython, numba, async programming and the multiprocessing library. Stay tuned!!!