A Journey to find a memory leak
In this article, I will cover my journey to find and fix a memory leak in a PHP application. The final patch is simple, but only the journey is important, right?
Section intitulée introductionIntroduction
In our application, we had a worker that consumed a lot of RAM. After 10 seconds, the consumption reached about 1.5Gb! I use to find and eradicate memory leak quite quickly, but this time, it caused me a lot of trouble.
In the past, I used php-meminfo, a very good extension. But it is not compatible with PHP 7.3+ yet. Unfortunately, we run PHP 7.4.
So I use primitive tool: I added few calls to memory_get_usage()
in my
worker. And… surprise it reported very low memory usage: about 50Mb whereas my
OS reported more than 1Gb. What the hell is going on here?
Then I tried blackfire, same, it’s not able to see what’s going on.
I needed to do my homework, so I re-read an old article written by Julien Pauli about Zend Memory Manager.
To summarize this article very quickly:
- PHP adds a layer on top of your OS to manage the memory;
- When you declare a variable, PHP uses the memory manager to allocate the RAM;
- When you call
memory_get_usage()
, PHP asks the memory manager how much memory it has allocated.
OK, the issue should not be my code nor the vendor. It should be in an extension, or PHP itself ! But I may be wrong :)
Section intitulée what-is-in-the-memoryWhat is in the memory?
The application has too many lines of code, and since memory_get_usage()
reports the wrong memory usage, I’ll need to find another way to find this leak.
I decided to see what’s in the RAM to make decisions. To do that I started to look at what part of RAM was growing. At this point I was pretty sure the issue was in an extension. I ran the following command twice :
sudo cat /proc/<PID>/maps > before # or after
- when the worker started, but before handling messages;
- after 15s of handling messages.
And I made a diff on these two files.
Surprise, the HEAP grew a lot. Let’s dump it thanks to following command (I found the memory addresses thanks to the previous command):
$ sudo gdb -p <PID>
dump memory ./memory.dump 0x1234567 0x98765432
Since it’s full of binary data, I used my favorite command in such situation:
strings memory.dump > memory.dump.string
And then I opened the file with vim. It was full of HTML. OK, I think I found the culprit.
Section intitulée make-a-reproducerMake a reproducer
The worker was responsible of the following tasks :
- Read data (HTML) from RabbitMQ with the AMQP extension;
- Analyze theses data with the DOM extension;
- Publish results in RabbitMQ.
So I make a reproducer to test each part of the code:
$count = 25_000;
// Blank
for ($i=0; $i < $count ; $i++) {
$content = file_get_contents(__DIR__. "/fixtures/$i.txt");
}
// PCRE
for ($i=0; $i < $count ; $i++) {
$content = file_get_contents(__DIR__. "/fixtures/$i.txt");
preg_match('/title/', $content, $m);
}
// DOM
for ($i=0; $i < $count ; $i++) {
$content = file_get_contents(__DIR__. "/fixtures/$i.txt");
$d = new Crawler($content);
$t = $d->filter("title");
}
// JSON
for ($i=0; $i < $count ; $i++) {
$content = file_get_contents(__DIR__. "/fixtures/$i.txt");
json_encode($content);
}
// AMQP
$channel = $c->get(Broker::class)->getAmqpChannel();
$exchange = new AMQPExchange($channel);
$exchange->setType('direct');
$exchange->setName('leak');
$exchange->declare();
$queue = new AMQPQueue($channel);
$queue->setName('leak');
$queue->setArgument('x-queue-mode', 'lazy');
$queue->declare();
$queue->bind('leak', 'leak');
for ($i=0; $i < $count ; $i++) {
$content = file_get_contents(__DIR__. "/fixtures/$i.txt");
$exchange->publish($content, 'leak', AMQP_NOPARAM, ['delivery_mode' => 2]);
}
for ($i=0; $i < $count ; $i++) {
$envelope = $queue->get();
if (!$envelope) {
break;
}
$queue->ack($envelope->getDeliveryTag());
}
And I benched the code. Nothing was wrong here. Bad news! Or good news: PHP does not leak.
Section intitulée reconsider-everythingReconsider everything
So I go back to my code, and I started to bypass some part of the code, until the application does not leak.
I was in the part I thought in the beginning: the analysis of HTML. So now I’m able to create a new reproducer, with the exact part of what is going badly:
use Masterminds\HTML5;
require __DIR__.'/vendor/autoload.php';
$html = file_get_contents('https://www.php.net/');
$html5 = new HTML5();
$dom = $html5->loadHTML($html);
echo "Converting to HTML 5\n";
for ($i=0; $i < 100; $i++) {
$html5->saveHTML($dom); // This is this line in my application that leak
printf("%.2f\n", memory_get_usage(false) / 1024 / 1024);
}
The results were a bit crazy, the value kept growing .
The fix was pretty obvious and easy.
Section intitulée but-waitBut wait
At this point I was a bit confused: I managed to find a leak with memory_get_usage()
, but I said the leak could not be found with this tool. Actually I found an additional leak.
So I started to dig again, and I managed to create this reproducer:
$content = file_get_contents('https://www.php.net/');
$count = $argv[1] ?? 251;
for ($i = 0; $i < $count; $i++) {
$crawler = new Crawler($content);
$nodes = $crawler->filterXPath('descendant-or-self::head/descendant-or-self::*/title');
$nodes->each(static function ($node): void {
$node->html();
});
if (0 == $i % 10) {
preg_match('/^VmRSS:\s(.*)/m', file_get_contents('/proc/self/status'), $m);
printf("%03d - %.2fMb - %s\n", $i, memory_get_usage(true) / 1024 / 1024, trim($m[1]));
}
}
This code could be simplified, but it looks like what I have in the application. As you can see, I used two methods to get the memory usage:
-
memory_get_usage()
: This is what is seen by PHP and its memory manager; -
/proc/self/status
: This reports information seen by my OS. This is much more accurate than the former.
And here the result where astonishing:
i - PHP - OS - Duration
000 - 4.00Mb - 37936 kB - 0.084s
010 - 4.00Mb - 45648 kB - 0.530s
020 - 4.00Mb - 53040 kB - 0.991s
030 - 4.00Mb - 60696 kB - 1.488s
040 - 4.00Mb - 68352 kB - 1.981s
050 - 4.00Mb - 76008 kB - 2.455s
060 - 4.00Mb - 83400 kB - 2.973s
070 - 4.00Mb - 91056 kB - 3.576s
080 - 4.00Mb - 98712 kB - 4.208s
090 - 4.00Mb - 106368 kB - 4.682s
100 - 4.00Mb - 113760 kB - 5.146s
110 - 4.00Mb - 121416 kB - 5.622s
120 - 4.00Mb - 129072 kB - 6.098s
130 - 4.00Mb - 136728 kB - 6.561s
140 - 4.00Mb - 144120 kB - 7.024s
150 - 4.00Mb - 151776 kB - 7.491s
The leak is terrible. In 150 iterations, it consumes more than 150Mb
PHP does not see any increase, but my OS does. How could it be?
Section intitulée what-is-the-real-causeWhat is the real cause?
I read a bit the code, and I saw that:
$rules = new OutputRules($stream, $options);
$trav = new Traverser($dom, $stream, $rules, $options);
and in the Traverser constructor:
$this->rules->setTraverser($this);
We have a cyclic reference here. And this is something PHP does not like. It makes freeing memory harder. Only the Garbage Collector can solve this issue.
I could let the GC do its job, but this code was on a critical path, where we need extreme performance. Moreover, the GC does not run every time. It is triggered whenever 10000 possible cyclic objects or arrays are currently in memory and one of them falls out of scope.
But 2 objects in memory, that should not be that bad? No it’s not until I saw that:
$this->dom = $dom;
OK! Here we have a demoniac combination:
- We store some data in a DOMElement. This data is not managed by the memory manager. It’s in the libxml extension. That’s exactly why the Zend Memory Manager could not see the memory leak;
- We have a cyclic reference. PHP could not clear this data quickly. Only the GC can.
Section intitulée conclusionConclusion
I eventually made another patch to mitigate this leak.
In this patch, I “help” PHP to free memory by breaking the circular reference. The Garbage Collector is not involved anymore, and the memory stays constant and very low.
I’m happy.
So how to prevent such issue:
- First, avoid as much as possible cyclic references. You may read everywhere that it often reflects a bad design (it’s not always the case);
- When working with stream, be really careful or you may invoke some development esthete.
Commentaires et discussions
Ces clients ont profité de notre expertise

Dans le cadre d’une refonte complète de son architecture Web, le journal en ligne Mediapart a sollicité l’expertise de JoliCode afin d’accompagner ses équipes. Mediapart.fr est un des rares journaux 100% en ligne qui n’appartient qu’à ses lecteurs qui amène un fort traffic authentifiés et donc difficilement cachable. Pour effectuer cette migration, …

L’équipe de Finarta a fait appel à JoliCode pour le développement de leur plateforme Web. Basée sur le framework Symfony 2, l’application est un réseau privé de galerie et se veut être une place de communication et de vente d’oeuvres d’art entre ses membres. Pour cela, de nombreuses règles de droits ont été mises en places et une administration poussée…

À l’occasion de la 12e édition du concours Europan Europe, JoliCode a conçu la plateforme technique du concours. Ce site permet la présentation des différents sites pour lesquels il y a un appel à projets, et encadre le processus de recueil des projets soumis par des milliers d’architectes candidats. L’application gère également toute la partie post-concours…