6min.

A Journey to find a memory leak

In this article, I will cover my journey to find and fix a memory leak in a PHP application. The final patch is simple, but only the journey is important, right?

Section intitulée introductionIntroduction

In our application, we had a worker that consumed a lot of RAM. After 10 seconds, the consumption reached about 1.5Gb! I use to find and eradicate memory leak quite quickly, but this time, it caused me a lot of trouble.

In the past, I used php-meminfo, a very good extension. But it is not compatible with PHP 7.3+ yet. Unfortunately, we run PHP 7.4.

So I use primitive tool: I added few calls to memory_get_usage() in my worker. And… surprise it reported very low memory usage: about 50Mb whereas my OS reported more than 1Gb. What the hell is going on here?

Then I tried blackfire, same, it’s not able to see what’s going on.

I needed to do my homework, so I re-read an old article written by Julien Pauli about Zend Memory Manager.

To summarize this article very quickly:

  • PHP adds a layer on top of your OS to manage the memory;
  • When you declare a variable, PHP uses the memory manager to allocate the RAM;
  • When you call memory_get_usage(), PHP asks the memory manager how much memory it has allocated.

OK, the issue should not be my code nor the vendor. It should be in an extension, or PHP itself ! But I may be wrong :)

Section intitulée what-is-in-the-memoryWhat is in the memory?

The application has too many lines of code, and since memory_get_usage() reports the wrong memory usage, I’ll need to find another way to find this leak.

I decided to see what’s in the RAM to make decisions. To do that I started to look at what part of RAM was growing. At this point I was pretty sure the issue was in an extension. I ran the following command twice :

sudo cat /proc/<PID>/maps > before # or after
  • when the worker started, but before handling messages;
  • after 15s of handling messages.

And I made a diff on these two files.

Surprise, the HEAP grew a lot. Let’s dump it thanks to following command (I found the memory addresses thanks to the previous command):

$ sudo gdb -p <PID>
dump memory ./memory.dump 0x1234567 0x98765432

Since it’s full of binary data, I used my favorite command in such situation:

strings memory.dump > memory.dump.string

And then I opened the file with vim. It was full of HTML. OK, I think I found the culprit.

Section intitulée make-a-reproducerMake a reproducer

The worker was responsible of the following tasks :

  • Read data (HTML) from RabbitMQ with the AMQP extension;
  • Analyze theses data with the DOM extension;
  • Publish results in RabbitMQ.

So I make a reproducer to test each part of the code:

$count = 25_000;
​
// Blank
for ($i=0; $i < $count ; $i++) {
    $content = file_get_contents(__DIR__. "/fixtures/$i.txt");
}
​
// PCRE
for ($i=0; $i < $count ; $i++) {
    $content = file_get_contents(__DIR__. "/fixtures/$i.txt");
    preg_match('/title/', $content, $m);
}
​
// DOM
for ($i=0; $i < $count ; $i++) {
    $content = file_get_contents(__DIR__. "/fixtures/$i.txt");
    $d = new Crawler($content);
    $t = $d->filter("title");
}
​
// JSON
for ($i=0; $i < $count ; $i++) {
    $content = file_get_contents(__DIR__. "/fixtures/$i.txt");
    json_encode($content);
}
​
// AMQP
$channel = $c->get(Broker::class)->getAmqpChannel();
$exchange = new AMQPExchange($channel);
$exchange->setType('direct');
$exchange->setName('leak');
$exchange->declare();
$queue = new AMQPQueue($channel);
$queue->setName('leak');
$queue->setArgument('x-queue-mode', 'lazy');
$queue->declare();
$queue->bind('leak', 'leak');
for ($i=0; $i < $count ; $i++) {
    $content = file_get_contents(__DIR__. "/fixtures/$i.txt");
    $exchange->publish($content, 'leak', AMQP_NOPARAM, ['delivery_mode' => 2]);
}
for ($i=0; $i < $count ; $i++) {
    $envelope = $queue->get();
    if (!$envelope) {
        break;
    }
    $queue->ack($envelope->getDeliveryTag());
}

And I benched the code. Nothing was wrong here. Bad news! Or good news: PHP does not leak.

Section intitulée reconsider-everythingReconsider everything

So I go back to my code, and I started to bypass some part of the code, until the application does not leak.

I was in the part I thought in the beginning: the analysis of HTML. So now I’m able to create a new reproducer, with the exact part of what is going badly:

use Masterminds\HTML5;

require __DIR__.'/vendor/autoload.php';

$html = file_get_contents('https://www.php.net/');
$html5 = new HTML5();
$dom = $html5->loadHTML($html);
echo "Converting to HTML 5\n";
for ($i=0; $i < 100; $i++) {
    $html5->saveHTML($dom);  // This is this line in my application that leak
    printf("%.2f\n", memory_get_usage(false) / 1024 / 1024);
}

The results were a bit crazy, the value kept growing .

The fix was pretty obvious and easy.

Section intitulée but-waitBut wait

At this point I was a bit confused: I managed to find a leak with memory_get_usage(), but I said the leak could not be found with this tool. Actually I found an additional leak.

So I started to dig again, and I managed to create this reproducer:

$content = file_get_contents('https://www.php.net/');

$count = $argv[1] ?? 251;

for ($i = 0; $i < $count; $i++) {
    $crawler = new Crawler($content);
    $nodes = $crawler->filterXPath('descendant-or-self::head/descendant-or-self::*/title');
    $nodes->each(static function ($node): void {
        $node->html();
    });
    if (0 == $i % 10) {
        preg_match('/^VmRSS:\s(.*)/m', file_get_contents('/proc/self/status'), $m);
        printf("%03d - %.2fMb - %s\n", $i, memory_get_usage(true) / 1024 / 1024, trim($m[1]));
    }
}

This code could be simplified, but it looks like what I have in the application. As you can see, I used two methods to get the memory usage:

  • memory_get_usage(): This is what is seen by PHP and its memory manager;
  • /proc/self/status: This reports information seen by my OS. This is much more accurate than the former.

And here the result where astonishing:

i   - PHP    - OS       - Duration
000 - 4.00Mb - 37936 kB - 0.084s
010 - 4.00Mb - 45648 kB - 0.530s
020 - 4.00Mb - 53040 kB - 0.991s
030 - 4.00Mb - 60696 kB - 1.488s
040 - 4.00Mb - 68352 kB - 1.981s
050 - 4.00Mb - 76008 kB - 2.455s
060 - 4.00Mb - 83400 kB - 2.973s
070 - 4.00Mb - 91056 kB - 3.576s
080 - 4.00Mb - 98712 kB - 4.208s
090 - 4.00Mb - 106368 kB - 4.682s
100 - 4.00Mb - 113760 kB - 5.146s
110 - 4.00Mb - 121416 kB - 5.622s
120 - 4.00Mb - 129072 kB - 6.098s
130 - 4.00Mb - 136728 kB - 6.561s
140 - 4.00Mb - 144120 kB - 7.024s
150 - 4.00Mb - 151776 kB - 7.491s

The leak is terrible. In 150 iterations, it consumes more than 150Mb

PHP does not see any increase, but my OS does. How could it be?

Section intitulée what-is-the-real-causeWhat is the real cause?

I read a bit the code, and I saw that:

$rules = new OutputRules($stream, $options);
$trav = new Traverser($dom, $stream, $rules, $options);

and in the Traverser constructor:

$this->rules->setTraverser($this);

We have a cyclic reference here. And this is something PHP does not like. It makes freeing memory harder. Only the Garbage Collector can solve this issue.

I could let the GC do its job, but this code was on a critical path, where we need extreme performance. Moreover, the GC does not run every time. It is triggered whenever 10000 possible cyclic objects or arrays are currently in memory and one of them falls out of scope.

But 2 objects in memory, that should not be that bad? No it’s not until I saw that:

$this->dom = $dom;

OK! Here we have a demoniac combination:

  • We store some data in a DOMElement. This data is not managed by the memory manager. It’s in the libxml extension. That’s exactly why the Zend Memory Manager could not see the memory leak;
  • We have a cyclic reference. PHP could not clear this data quickly. Only the GC can.

Section intitulée conclusionConclusion

I eventually made another patch to mitigate this leak.

In this patch, I “help” PHP to free memory by breaking the circular reference. The Garbage Collector is not involved anymore, and the memory stays constant and very low.

I’m happy.

So how to prevent such issue:

  • First, avoid as much as possible cyclic references. You may read everywhere that it often reflects a bad design (it’s not always the case);
  • When working with stream, be really careful or you may invoke some development esthete.

Commentaires et discussions

Ces clients ont profité de notre expertise