PHP 7.4 FFI: What you need to know

(🇫🇷 Lire la version en Français ici)

PHP Foreign Function Interface, or FFI for fans, is a PHP extension that allows you to include with ease some externals libraries into your PHP code. That means it’s possible to use C, Go, Rust, etc. shared library directly in PHP without writing a PHP Extension in C. This concept exists for years in other languages like Python or Go.

UUID Generation

Let’s start with a small example: UUID generation.

With PHP there are several ways to generate UUIDs. The best way is to use the PECL UUID extension. You may read its code on GitHub. This PHP extension takes care of binding PHP functions to the libuuid. To make it work, you must install libuuid on your system (don’t worry, it’s almost always the case) and the PECL.

This is what happen when we call uuid_create() from PHP userland code:

   +---------------------+
   |    your PHP code    |
   +---+-------------^---+
       v             ^
   +---v-------------+---+
   |     PHP engine      |
   +---+-------------^---+
       v             ^
   +---v-------------+---+
   |      UUID ext       |
   +---+-------------^---+
       v             ^
   +---v-------------+---+
   |       UUID lib      |
   +---------------------+

What FFI promises is to replace the layer “UUID extension” with pure PHP Code.

Before talking about the PHP extension or the FFI layer, we need to explain what is a library. A library is usually written in C. But it can be written in many other languages that are able to compile as a shared library: C++, Rust, Go, etc. On unix or linux, the library will be compiled in a .so file. On windows it will be a .dll file. It’s also possible to statically include a library into a binary, but this chapter is out of the article context.

In the library code source, there are .h files. They contain what the library is able to do. This is an extract of the uuid.h file:

# …
# Some constants:
#define UUID_VARIANT_NCS    0
#define UUID_VARIANT_DCE    1
#define UUID_VARIANT_MICROSOFT  2
#define UUID_VARIANT_OTHER  3

# Some function declarations:
void uuid_generate(uuid_t out);
int uuid_compare(const uuid_t uu1, const uuid_t uu2);
# …

A .h file is something similar to a PHP interface: it contains constants and functions signatures.

FFI/UUID layer

In order to work, FFI needs function signatures of the underlying library (libuuid) that we want to use. So we will copy the .h file in our project. Sometimes, you may clean and adapt this file to your needs. For example you may remove function that you will never use. This is what our file looks like:

#define FFI_LIB "libuuid.so.1"

typedef unsigned char uuid_t[16];

extern void uuid_generate_time(uuid_t out); // v1
extern void uuid_generate_md5(uuid_t out, const uuid_t ns, const char *name, size_t len); // v3
extern void uuid_generate_random(uuid_t out); // v4
extern void uuid_generate_sha1(uuid_t out, const uuid_t ns, const char *name, size_t len); // v5

This is the most important, and the more complex part of the code to write. Once done, we can include this file in our PHP code:

$ffi = FFI::load(__DIR__ . '/include/uuid-php.h');

Et Voilà! We can now use libuuid directly from our PHP Code. Easy, isn’t?

But wait a minute, libuuid does not work exactly like that. All functions expect some typed arguments as you may have seen. Theses functions does not return a UUID, but will modify by reference the first argument. So we will need this value before calling the function:

$output = $ffi->new('uuid_t');

$output is an instance of FFI\CData. According to the internal type of the CData, we can access to the different values thanks to different operators described in the documentation.

Finally, we can call our function. uuid_generate_random() to match the name exposed by the library in the .h file:

$ffi->uuid_generate_random($output);

The content of $output will be updated with an array of decimal value that compose the UUID. Now we need to convert this array to a string of hexadecimal values:

foreach ($output as $values[]);

$uuid = sprintf('%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-%02x%02x%02x%02x%02x%02x', ...$values);

Do you like it? If you don’t want to bother reproducing this, we made it for you and it’s open source: https://github.com/jolicode/ffi-uuid🍾

Some thoughts

Simplicity

The binding of an external library is really easy. The most complicated part is the creation of a minimal .h file, and to map PHP types to the library and vice-versa.

Performance

It’s also very interesting to look at the performances of our implementation. In our repository, you can find a benchmark script. This is the result of the comparison between our implementation and the PECL:

FFI:
 * [v1] 1.254s
 * [v4] 5.301s
PECL:
 * [v1] 0.626s
 * [v4] 4.583s

As can see, the PECL is twice faster than our implementation of UUID V1 but only 15% faster for UUID V4. We can explain that easily: a UUID V4 is composed only of pseudo-random data, while a UUID V1 contains many static blocs. Getting random data is a bit slow, this is why the V4 generation is much slower. And the difference of both implementation is less visible on V4 because almost all the time is spent inside libuuid.

What can we conclude?

FFI is still really young (not even released when I’m writing this). So we can expect some performance improvement. However we can already say:

  • If a native extension already exist and you can install it: use it;
  • If the extension does not exist, FFI is a very good candidate;
  • If you have a bottleneck in your application, it may be interesting to port these bits of code in C, Rust, etc. and to bind them with FFI. FFI will become really interesting when the CPU is bounded: DOM management, big array, complex calculations, etc.

Does native extension will be replaced by FFI?

It’s really too soon to tell. However some extensions like PDO does much more than a simple binding to a library. I’m pretty confident that these extensions will not be replaced by FFI.

Nonetheless some extensions may be replaced. It’s the case for php-redis, amqp, uuid, etc. For example Remi Collet already began to play with FFI to replace redis extension.

FFI open some doors: it will be possible to replace some pure PHP library and to use low level library instead. This is the case for gitlib that could leverage libgit2 with FFI.

In some situations, there are no C extension, nor pure PHP implementation. If you ever try to test TensorFlow in PHP, you know it’s… complicated. Dmitry Stogov, one of the most important PHP Core contributors – but also the author of PHP/FFI, has created a POC to bind TensorFlow to PHP.

What language to choose to bind a lib to PHP?

All languages able to compile to a shared library (.so) are not systematically good to be bind to PHP. It’s better to use languages without runtime (C / C++ / Rust / …) because a runtime may have side effects. For example in GO, the runtime has a garbage collector and manage threads for the goroutines. This can slow down the execution or even break your application.

How to bind a Rust lib to PHP?

I wanted to try if it was faster to execute complex calculations in another language than PHP. It’s quite common to extract pieces of HTML from a web page: in order to test your website, or when crawling the web.

Joel made a small library that extract the first HTML element of a document that match a CSS expression. The code is really short, to the extent that the conversion of Rust type to C Type represents more than 2/3 of the code.

The PHP binding look like something very similar than for the UUID. But here we include the .so directly in the source code:

$ffi = FFI::cdef(<<<EOH
const char *cssfilter(const char *html, const char *filter);
EOH, __DIR__.'/../target/release/libcssfilter.so');

And the usage is even simpler:

$value = $ffi->cssfilter($html, $selector);

And the performances are really impressive and encouraging:

FFI:
 Duration: 1.731s
symfony/crawler:
 Duration: 2.321s

We can easily conclude that if the calculations are complex, it might be very interesting to port a part of the PHP code to another language in order to increase the performance of our application.

Conclusion

FFI is not released yet, but I’m already a big fan 🤩. FFI will allow us to try some libs even if the binding does not exist yet. And it will allow us to replace some slow part of our code by a Rust (for example) implantation. And I’m sure it will unlock some idea we don’t have yet 🤯.

blog comments powered by Disqus