I while ago I started writing some bindings for libsecp256k1, a library used by bitcoin core, to speed my python code. Before getting started I looked at an article published by realpython which provided some basic info, which were very much needed as I was new to writing bidings but also had very little knowledge about C. In the article it talked about a variety of tools to write bindings and in the end I decided on CFFI, which seemed the better of the bunch. However I quickly found out that the information given in the tutorial was very superficial, it was good only for toy libraries and not real world examples. So after my experience I am writing about all of the information I found useful along the way.
Disclaimer
- Before reading this tips you should read a more basic cffi tutorial, like the one by real python previously mentioned.
- I am not an expert in C or CFFI development. This tips are based on my experience trying (successfully) to write bindings for two C established libraries.
- The followings work on *nix based systems. If you are on windows you will have to install Windows subsytem for Linux or use some other virtualization technology.
Building the library
In most tutorials the c source code is in the same repository of the python bindings. However the c code may change in the upstream repository, and you don't want to copy the code every time a commit is made! Fortunately there is a better way, using git submodules. Git submodules let us include another git repository into our own as if it was a simple folder! To use it simply type in the terminal:
$ git submodules add {path_to_the_repository}
Now you have the entire repository with the c source code in a folder. Now if they make a change upstream and you want the changes to be included in your copy, simply update the submodules with the following command:
$ git submodule foreach git pull
Now that you have the source code of the library you want to build it using python code. To do it you can use the subprocess
library
import subprocess
subprocess.call(["make"], cwd='c_library')
Here I have assumed that your library uses make as the build system. The cwd
parameter is the directory in which the command is executed. Most of the time to build a library you have to be at the root of the source code, so the cwd
paramter tells subprocess to execute the command in our submodule.
Prefer static libraries when possible
Most informatin out there, even the official documentation, talks about how to link against dinamy libraries. However I found out that working with static libraries, when possible, is much easier. That's because static libraries are included in the final extensions, and so we don't have to think about packaging additional data, which is a painful process using setup.py (and we can't use tools like poetry since they don't support cffi). If you don't know how to build static libraries here is some useful information. However if you are not the maintainer of the c library, chances are there are already some way you can obtain a static library by modifying the configuration files.
Unfortunately I found that using static libraries is not always possible, especially when trying to create bindings that should work on Windows. In this case you can use should read about the out of line abi level mode. It can work really well but you will have to make sure that the shared library is included with your package. You can read this answer on stackoverflow for some suggestions. This mode is also useful if you want to target different python implementations, such as PyPy.
Always compile with position independent code
When compiling your extension you may face the following error
$ ... can not be used when making a shared object; recompile with -fPIC
This is happening because the c code has not been compile with position independent code. As the error suggests we need to compile the c code with the -fPIC option. Position independent code is used usually with shared libraries, so if we are using them we shouldn't see this error very often. On the other hand if we are using static libraries we will have to add this option to the configuration files almost every time.
If our libraries is dependent on other libraries we have to make sure that they are compiled with -fPIC too! When the build system is complex is not so simple to figure out how to do it, but at least now you know what to look for instead of banging your head against a wall like I did!
Pay attention to set_source arguments
The cffi.set_source
function is used_to configuration before building the actual extension, and so it needs a lot of parameters which are given to the compiler. However this arguments were not very clear to me so I wasted a lot of time simply because I had provided the wrong arguments. So here is an explanation for the most important ones:
sources
: This argument must be a string which represent a c header that simply imports all of the files you need to call from python. For example if you want to call some c functions from two files,a.c
andb.c
, yousources
must be:ffi.set_source( """ #include "path/to/a.h" #include "path/to/b.h" """ ... )
Each path must be between quotations mark and must be the exact path to the file. I think there is a way to not write the exact path but I found this way a lot easier
libraries
: when writing bindings for a library this is a very important argument. It is a list of strings and each strings is the name of a library you want to include. For example if you are working with thessl
library, yourset_source
arguments will look like:ffi.set_source( ..., libraries = ['ssl'], ... )
With this cffi will look for a file named
libssl.*
with the appropriate extension. Even though the name of the file starts withlib
you mustn't write it in the the argument list. Also keep in mind that if your library depends on other libraries and you want to use static libraries instead of dynamic ones you will have to add every one of this libraries in the list too!libraries_dirs
: You will also have to provide a list of paths in which the compiler will look for the different libraries. The elements of this list must be strings, they can't be of typePath
unfortunately.include_dirs
: If your library as dependencies your library will most probably have ainclude
folder. This folder contains the header files of its dependencies. You will have to add this paths (plural) as a list of strings in theinclude_dirs
argument, similarly tolibraries_dirs
.extra_objects
: If you want to write bindings for a c program, not a library, you will probably not have a.lib
or.so
files at the end of compilation, but a series of.o
objects. For example I tried writing some bindings to wrk when researching for this post, and I had to use this argument. You cant find this example here. Keep in mind that most probably won't use this argument, unless you have a very precise use case. Also don't confuse this argument withlibraries
. I made the error of including the libraries as argument toextra_objects
and I lost a lot of time troubleshooting that.
Automatic c definitions
When building a python extension we don't want to write c code, right? However there is a problem with ffi.cdef
. We must pass a c header to this function so that cffi is able to create all of the bindings for every one of this function. However if we try to give to this function the headers of our library we will find out that it will throw lots of errors. That's because c headers contain lots of information that cffi doesn't need and that will confuse it. So we have to simplify the headers. One way is to pass our headers through a preprocessor first. I had this idea on my own, but then I found out a beautiful article that explains this concept in detail. In addition to the article you may also remove some of the unnecessary code with regular expressions. It can be very effective, in fact I used it for my secp256k1 bindings. However it works well only for simple libraries. If you have a very complex one the information you remove may be essential to the compiler, and you may end up with bindings that don't work. In this case you may have to write your own simplified c headers by hand. Ouch!
Conclusion
A learned a lot during my journey writing binding for a c library with cffi. I learned a lot about how to compile c code, how python searches for modules in your filesystem, what are the difference between different static and shared libraries and so on. But I also spent a lot of time troubleshooting stupid mistakes! I hope that this tips will save you time and most importantly save you from headaches!