For various reasons we sometimes need to use C or C++ libraries inside a Python program: it may be because there is a component critical to the performance that can gain speed by using a C code, or it may be because there is already an excellent library written in C++ that we want to use instead of writing it in Python from scratch.

Here we will discuss both cases using one of my projects, loom, as an example. It includes using a C code discussed in [numba_vs_ctypeslib] inside a Python code, and glueing an external C++ library called CGAL(Computational Geometry Algorithms Library, http://www.cgal.org) to a Python program using NumPy's ctypes interface, which makes the task easier when using NumPy's array than using the original ctypes interface of Python. NumPy's own documentation is a good starting point of getting an idea of how to use numpy.ctypeslib.

Preparing a C library

Creating a C library

One thing to note is that NumPy's ctypes interface, and as a matter of fact Python's ctypes interface too, requires the external library to load to be a shared library. This means that you may need to build the shared library again if you moved your library into a new system, because the linker in the previous system may have used different paths for various libraries that the shared library has dependency. The following is a Makefile in loom/clibs to build a shared library s_wall.so from two C source code files, solve.c and s_wall.c. Note that we are using the option -fPIC when compiling the source codes, where PIC stands for Position Independent Code.

CC = gcc
CFLAGS = -Wall -fPIC -std=c99
all: s_wall.so

s_wall.so: s_wall.o solve.o
    $(CC) -Wall -shared -o s_wall.so s_wall.o solve.o

s_wall.o: s_wall.c
    $(CC) $(CFLAGS) -c s_wall.c

solve.o: solve.c
    $(CC) $(CFLAGS) -c solve.c

clean:
    rm *o

Using a C++ library

Because NumPy's ctypes interface is for C libraries, when using a C++ library we need to declare to the compiler that we want to compile the C++ code to have a C interface, so that extra decorations of the function name will not happen. (See this stackoverflow thread to know more about this.) This is done simply by using extern "C" keyword, as is used in loom/cgal_intersection/cgal_intersecion.h, which is shown below.

// cgal_intersection.h
typedef struct {double x; double y;} coordinate;

extern "C" int find_intersections_of_curves(
    coordinate *curve_1, long curve_1_size,
    coordinate *curve_2, long curve_2_size,
    coordinate *intersections, int max_num_of_intersections
);

Constructing an interface between C and Python

Now we have a shared library, the next task is to prepare an interface between a function in the shared library and a Python program. If the function is a simple one, like getting a couple of integers as its arguments and returns a floating-point number, then it can be directly used by the ctypes interface without any additional work as described in this NumPy documentation. But usually we want to exchange a large NumPy array for data and/or a C struct for messages between a C program and a Python program. Both are not much difficult when using numpy.ctypeslib, as we will see shortly.

Interface for NumPy array

Let's use the library with an interface defined in cgal_intersection.h as an example to build an interface with Python. Relevant Python codes excerpted from loom/ctypes_api.py are shown below, which loads libcgal_intersection.so, access find_intersections_of_curves(), a C++ function with a C interface defined in the C++ header file, and returns a reference to the function.

numpy_ctypeslib_flags = ['C_CONTIGUOUS', 'ALIGNED']

array_1d_complex = numpy.ctypeslib.ndpointer(
    dtype=numpy.complex128,
    ndim=1,
    flags=numpy_ctypeslib_flags,
)

def libcgal_get_intersections():
    lib_name = 'libcgal_intersection'

    libcgal_intersection = numpy.ctypeslib.load_library(
        lib_name,
        (os.path.dirname(os.path.realpath(__file__)) +
         '/cgal_intersection/'),
    )

    get_intersections = (libcgal_intersection.
                         find_intersections_of_curves)

    get_intersections.restype = ctypes.c_int
    get_intersections.argtypes = [
        array_1d_complex,
        ctypes.c_long,
        array_1d_complex,
        ctypes.c_long,
        array_2d_float, ctypes.c_int,
    ]

    return get_intersections

Because this works with a C interface, we need to specify the data types of the arguments and the return value of the function, which is conveniently done by setting restype and argtypes attributes of the ctypes library object returned by numpy.ctypeslib.load_library. Usual data types are conveniently provided via ctypes, like ctypes.c_int in the above. See this Python documentation about data types defined in ctypes for more detail.

One thing you should be careful about is not to allocate memory inside a C function and return its pointer. A safer way is to create a NumPy array from the Python side and let Python takes care of all the memory management, then give the reference to the NumPy array to the C function, which is done with numpy.ctypeslib.ndpointer in the above. First note that we specifies two flags ['C_CONTIGUOUS', 'ALIGNED'] for NumPy arrays so that the C function can safely access the memory allocated for the NumPy array using a C pointer coordinate* as shown in cgal_intersection.h.

Now we can call libcgal_get_intersections() to get the function object and use it to call the C function, like the following code excerpted from loom/spectral_network.py.

get_intersections = libcgal_get_intersections()

intersections = numpy.empty((buffer_size, 2), dtype=numpy.float64)

num_of_intersections = get_intersections(
    new_s_wall.z[n_z_i:n_z_f + 1],
    ctypes.c_long(n_z_f + 1 - n_z_i),
    prev_s_wall.z[p_z_i:p_z_f + 1],
    ctypes.c_long(p_z_f + 1 - p_z_i),
    intersections, buffer_size
)

Note that we make a NumPy array intersections from the Python side and give it to the C function so that the C function can save values into the array. As I mentioned previously, this is to avoid for the C function to allocate memory for the array. Memory management is often a pitfall when writing a C code, so it's better to leave it to Python, and in this way we don't have to worry about how to transfer dynamically allocated array from C-side to Python-side.

Using struct

So now we can exchange NumPy arrays between a Python program and a C program, which is useful to transfer data between the two. But when exchanging structured data, like messages, it is much better to associate a C struct with a Python class. A good reference of ctypes for such a task is given in this Python documentation about struct in ctypes. Here we discuss a Python interface for the following C struct defined in loom/clibs/s_wall.h

// s_wall.h

typedef struct {
    int s_wall_size;
    int step;
    int stop_condition;
    int rv;
} message;

#define ERROR_SAME_XS -1
#define NEAR_PUNCTURE 1
#define MASS_LIMIT 2
#define IN_P_NBHD 3
#define OUT_P_NBHD 4

int grow(
    message* msg,
    int N,
    // and more arguments...
);

Using message.rv, which will have one of the integer values defined by the constants in the header, Python-side and C function grow() will exchange messages. A simplified version of the relevant Python code from loom/ctypes_api.py for such an interface is shown below.

ERROR_SAME_XS = -1
NEAR_PUNCTURE = 1
MASS_LIMIT = 2
IN_P_NBHD = 3
OUT_P_NBHD = 4

class Message(ctypes.Structure):
    _fields_ = [
        ('s_wall_size', ctypes.c_int),
        ('step', ctypes.c_int),
        ('stop_condition', ctypes.c_int),
        ('rv', ctypes.c_int),
    ]

    def error_same_xs(self):
        return self.rv == ERROR_SAME_XS

    def near_puncture(self):
        return self.rv == NEAR_PUNCTURE

    def mass_limit(self):
        return self.rv == MASS_LIMIT

    def in_p_nbhd(self):
        return self.rv == IN_P_NBHD

    def out_p_nbhd(self):
        return self.rv == OUT_P_NBHD

    def __str__(self):
        if self.error_same_xs():
            msg = 'x1 == x2'
        elif self.near_puncture():
            msg = 'near a puncture'
        elif self.mass_limit():
            msg = 'reached the mass limit'
        elif self.in_p_nbhd():
            msg = 'inside the neighborhood of a point'
        elif self.out_p_nbhd():
            msg = 'outside the neighborhood of a point'
        elif self.rv == 0:
            msg = 'successfully finished'
        else:
            raise RuntimeError

        return msg

clibs_s_wall = numpy.ctypeslib.load_library(
    's_wall',
    (os.path.dirname(os.path.realpath(__file__)) +
     '/clibs/'),
)

clibs_s_wall_grow.restype = ctypes.c_int

clibs_s_wall_grow.argtypes = [
    ctypes.POINTER(Message),
    ctypes.c_int,       # int N
    # and more arguments...
]

def grow(msg, *args):
    return clibs_s_wall_grow(msg, *args)

That is, we inherit ctypes.Structure to define a Python class corresponding to the C struct, and that is almost all the job we need to do. And we can enrich such a Python class to make it more convenient to use the class in Python as shown above. Because there is no neat way to expose a constant defined in a C header file via a shared library to Python (see this and this threads from stackoverflow), we define the same constant again in the Python code, but to avoid using the constants explicitly so that we can avoid making mistakes, a few helper functions are defined to identify the returned message. Then the reference to the class is given as a pointer to the C function, and in the middle is Python's ctypes working so that the memory layout matches between the Python class and the C struct.



Comments

comments powered by Disqus