Memory leaks: ultra-fast detector

This article is part of the debug techniques series. Its main purpose is the presentation of a very little memory leaks detection tool (libcc.c together with checkbug.c, see below).

I used cairo (1.12.16) for a little hack recently, with an X11 surface. After some time the program starts to be realllly slooooow. Why?

Running top shows that the memory grows and grows and grows.

Hum, memory leak. Those beasts are not friendly... There's nothing "wrong" going on. It's just some malloc with no free. Hard to spot.

Since my little hack does zero allocation, the problem is in a library. I only use cairo and X11. Probability for a problem in X11? Almost zero. From cairo? Hum... well... let's see!

Step one: do I correctly free the memory allocated by cairo? (maybe I forget a call to cairo_destroy_something or whatever)
Method: remove all the code, put it back in place piece by piece. That's where I saw it was a problem with the X11 surface. No memory leak with an image surface.
Answer: yes.

Step two: previous step reveals that cairo uses the X Shm backend, does something go weird in there?
Method: run ipcs while the program runs.
Answer: no.

Step three: are all malloc freed?
Method: valgrind? No, too big, too slow. Let's write some code. libcc.c.

You compile it with:

gcc -shared -fPIC libcc.c -o libcc.so -Wall

(Always pass -Wall. You don't want no warning at all. It is almost always indicating a problem in your code.)

You use it with:

LD_PRELOAD=./libcc.so ./my-buggy-program 2>TRACE

This code prints every pointer returned by malloc/calloc/realloc and all pointers freed by free.

Then you need to reorder the output (simply in a shell, with sort) and check that every allocated memory is followed by a free.

A malloc line looks like:

XXX 0x010027c0 = malloc 36

A free line:

XXX 0x018f0840 afree

Calling sort with those lines puts all the free for a given pointer first followed by all the malloc. Count how many free you have, how many malloc follow and if you have more malloc then you found a leak. See checkbug.c. (Yes, we print "afree" with an 'a'. It's on purpose, to have the free lines first. What? but "free" comes before "malloc" in the dictionary, no? Yes, but what for "calloc"? Hum?)

Thanks to backtrace (a GNU function), I also print a backtrace of the guilty malloc.

In my case it was from the X library, apparently dealing with the event queue, or something like that. So in my code, I called XEventsQueued and saw that indeed the event queue was getting bigger and bigger. Bingo.

But why? I call XChekMaskEvent and nothing is returned!

Reading the cairo sources I saw that it was generating events. Why? I don't know, too much code to read and analyze and no interest at all, especially if that's a bug in cairo. I sent an email to the cairo mailing list and got a satisfying answer.

Was it a bug in cairo? I don't know, but reading this, where the guilty send_event function is removed, pushes me hard to think it is indeed a bug. (Calling XCheckMaskEvent was not a good idea because it only deals with core X events and leaves the SHM stuff, better to go with XPending and XNextEvent, even if I didn't ask for SHM related events, so cairo should not expose those to the program. The previous sentence is weird.) (I was doing it wrong, let's say.) (But I should not receive events I didn't ask for.)

Yes, libcc.c is very very ugly. I had bugs with it so I changed it a lot and there is the result. How ironic, huh? You write a tool to find bugs and you end up with more bugs.

But you get the idea. You intercept calls to malloc/calloc/realloc and free and print them on stderr.

In the code, you see that I put the allocated size in front of the allocated buffer and use that in free to decrese the "val" variable. I then can check in the main program the value of "val" and see how the memory usage (through malloc and free) evolves. The code may be simplified if you don't need that. (And you need to pass libcc.so at compile time if your program reads "val".)

There are hacks here and there because there were crashes all around. Calling some functions in malloc and free in turn calls malloc and free and you end up with a big mess, so some care is needed in there. But as it is, yes, it is ugly. You can do it simpler, but the bug died and well, I'm done with it.

And that's it! No need to throw the slooooow and bloated (but extremely powerful) valgrind. Just a few lines of code, a bit of shell (grep and sort), and you're done.

Just be careful with the tools you create. Doing it wrong creates more bugs. And those can be very nasty...

Ah, a last note. The call trace looks like:

********************
./libcc.so(malloc+0xa6)[0x7f95e957db89]
/usr/lib/x86_64-linux-gnu/libXrender.so.1(XRenderQueryFormats+0x2bb)[0x7f95e735ddfb]
/usr/lib/x86_64-linux-gnu/libXrender.so.1(XRenderQueryVersion+0x2f)[0x7f95e735e21f]
/usr/lib/x86_64-linux-gnu/libcairo.so.2(+0x91c86)[0x7f95e92f0c86]
/usr/lib/x86_64-linux-gnu/libcairo.so.2(+0x96b95)[0x7f95e92f5b95]
/usr/lib/x86_64-linux-gnu/libcairo.so.2(cairo_xlib_surface_create+0xe2)[0x7f95e92fb002]
gui[0x402ce4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f95e897d995]
gui[0x401399]
********************

As you see, backtrace doest not give us the name of all the functions in libcairo. Why? I don't know, maybe because of -O2 or -O3 or something, or maybe not. But you can find them using objdump.

objdump -d /usr/lib/x86_64-linux-gnu/libcairo.so.2 | less

And then go to "91c86" (the offset you see in the trace) (search for it with the '/' command in less) and then scroll up to the beginning of the function to get its name.

Wait... While writing this article and doing the check, I don't have the function name. So, well, compile your own cairo library and link against it and you'll get it. And maybe backtrace will print the function name. In my case it was a no, so objdump was needed. And I was lucky that the bug was still popping up. Sometimes you recompile a library and... no more bug.

And to force the link against that new cairo? Some linker magic (-rpath).

gcc -o guilty guilty.c -L/path/to/new/cairo -lcairo -Wl,-rpath,/path/to/new/cairo

Or maybe setting LD_LIBRARY_PATH to /path/to/new/cairo does the trick too.

With bash:

export LD_LIBRARY_PATH=/path/to/new/cairo

Et voilĂ .
(Hell, french is my primary language after all!)

Ah yes, I've got to be honest. It's not really step one, two and three. All is a bit intermixed and some information is learned in this juggling process, like for the SHM thing. How did I know cairo was using SHM? I don't remember, but it's certainly not from step one alone. But well, for an article you have to structure things. So in fact it's more "a bit of step one, hum, let's try a bit of step three, no, wait, step two, ah shit, let's go back to step one but a little bit different". And you don't juggle with "step one", "step two" and "step three" but some unnamed categories that for the purpose of this article I call "step one" etc. But it's pretty accurate: some kind of "list" of things to try in your head and playing with that list, up to the point where the bug is dead.


Contact: sed@free.fr

Created: Tue, 03 Dec 2013 13:22:31 +0100
Last update: Tue, 03 Dec 2013 13:22:31 +0100