Mark_malloc
I want the reader to excuse my bad english, this is not a natural
language to me.
Introduction
int average(FILE *f)
{
int *value;
int nb_numbers;
int i;
int avg = 0;
fread(&nb_numbers, sizeof(int), 1, f);
value = malloc(nb_numbers * sizeof(int));
fread(value, sizeof(int), nb_numbers, f);
for (i = 0; i < nb_numbers; i++)
avg += value[i];
return avg / nb_numbers;
}
Let's forget the endianess, checks about read/malloc returns and
correctness of nb_numbers usage.
What do we have here ? Obviously, the malloc is never freed, so
if the function average
is used all around the program, we go in
trouble. This is a memory leak, meaning some allocated memory is never
freed explicitly by the program.
Here, it is obvious, and a simple free added before the function's
return is the solution. But what with much complicated memory
management ?
Most of the time, we have some structures in our programs, with
a life duration which we don't always know. Sometimes,
a mallocated buffer can change its meaning. For example, we can
have a very sofisticated allocation politic, for speed reasons,
where we want to avoid as much as possible calls to malloc and
free and so reusing useless buffers for absolutly totally different
usage, or we may have a list of free buffers so that we don't free
them all the time, but put them in this list instead, for a later
usage.
Bref, our coding practice leads us to errors in our malloc/free
handling ; this, even if we are experimented programmers.
Memory leaks detection
To avoid memory leaks, we have several options.
- Code perfectly well, with no error.
This is nonsense, we all do errors, and malloc/free errors
are some of the harder (a next step in hardness is distributed
programming, but it is off-topic).
- Do a static analysis of the program.
This means to inspect the code line by line, watching the malloc
and free that are done, and deciding, with the whole code in mind,
if those mallocs and frees are correct.
Due to some limitations on our brain, this solution can't be applied
directly this way. Several thousands lines of code can't be remembered
by a human being. Even a hundred of lines of code is hard to fully
remember, if no impossible.
So what ? The answer could be to reason about programs, abstract
unecessary constructs, thus resulting in a smaller and smaller program,
still correct to check the malloc/free handling. Well, this is a possible
direction. I am not aware of practicle results (ie. tools that can be
used to do this) in this area.
Another way is to add some semantical information about the pointers
used in our programs, thus helping a static analyzer in finding bugs
(see lclint for a good example).
This is a good idea, but not sufficient enough. We would like to have a
fully automatic tool. We would like to run :
check my_program.c
and have an easy to understand answer like :
line 10, this malloc is never freed
or something like that
(it should work with several input files).
Currently speaking (october 2001), no such tool exist. And we still need
something to help us in our coding ! And we need it today, not
in ten or twenty years. So what ?
- Have some coding practice.
A solution could be to impose us the use of some coding practice.
For example, each time a pointer becomes useless
somewhere in our code, free it systematically, even if we will do a malloc
of the same size in the next line (this is the extreme case, as you
can imagine, but you've got the point). We could totally avoid the use of
free buffers. We could always copy the parameters we get from another
module, so having private structures, much easier to trace. We could
do this, we could avoid that... This is endless.
The question here that arise is : what is a good
practice ? If one can answer that, in a formally manner
preferably, it would help most of us in our daily troubles.
And another question is : what about performance ? When we
take it into account, some hacks are necessary to speed up the code, and
they very well may break our "good practice".
- Do some runtime tests.
A runtime test is to trace, at the runtime, the value of
some variables of the program, and see if it is correct in respect
with what we expect. We can check parameters of functions,
calls to certain functions, and (what is done into mark_malloc)
the malloc/free stuff.
This is a working solution. It is far from perfect of course, for the
main reason it depends on inputs we provide to the program, and most of
the time, there is an infinite number of those inputs, so we don't catch
every case, so we can't say something fully reliable based on this
method only.
But it works. :-)
You can see in the Related work section below for some pointers
to other tools that exist, in this context.
This approach is the same that leaded to the creation of our beloved
debuggers, which should be useless if we did code well. But we don't
and I think I don't take a big risk saying we never will.
So, this is a not-so-bad solution, after all.
Mark_malloc
Mark_malloc is my proposition in the direction of dynamic inspection
of malloc/free bad usage.
It marks mallocated buffers and detects, to the end of the program,
the ones that are not freed. If it finds some that are not, it
will display the calling sequence of functions that are responsible
for this error.
It is written in C and works under unix (currently on iX86/linux
and sparc v8/solaris).
It interfaces well with C programs, and maybe others that I didn't
try. (I would be pleased to know if it can help C++ coders in their
coding. Badly, I don't do speak C++, so I can't try with a C++ program.)
You can read the detailed documentation to get more
information about it.
It is released in the public domain, even if I ripped some code
from the gnu binutils package.
This is a total lake of respect of the GPL, but I don't like
licences, even copylefted one. You can disagree with my views (I guess
you do in fact), but well, I won't change. Freedom wants no licence.
By the way, I wrote
this to
explain my opinion. And yes, I know this is illegal too. (Anyway, this
code is so short and so few people will check it and much fewer will use
it, that it's not big problem.)
And, if there was a good documentation about libbfd and how to use it,
I wouldn't have ripped code from addr2line
. So, I can say it
is bad design from the binutils team that leaded to this situation. I know
it's hard to write good documentation, but it is vital.
- Download
- Detailed documentation
Go to the documentation to get a deep view
onto mark_malloc, a small tutorial, and how to use it into your
development process.
Related work
Mark_malloc is far from perfect. I find it useful, that's the reason
why I give it publicly.
Some other tools exist, to help you in your coding.
I put on my site this file that
comes from
there with lots of links.
In particular, see electric fence, which is very useful in detecting
bad usage of buffers (when you do out of bounds access).
mpr is very similar to mark_malloc too, but provides much more
information. Well, check the page to see the rest.
To finish, a little word about
lclint.
Lclint is a static analyzer. It uses annotations to give
some sematical informations to the checker. It is nice even
if I prefer totally automatic methods (but I don't know if it
is possible in practice ; I know some
undecidability results (if you want the paper, contact me),
but when I see how the proof has been done, I won't say it's impossible
in our common coding styles).
There is a huge research field in this area. Where could you
start to get some information ? Well, the
ResearchIndex is a good
start, with a keyword based search engine. The
ACM's digital library is very good too, but maybe you won't be able
to access everything if you are not inside a university (I do have full
access to the digital library). And your favorite search engine can be used
too I guess. Try "memory leaks detection", "malloc errors" and the like.
Contact: sed@free.fr
Last udpate : Fri Oct 5 15:54:28 MET DST 2001
Powered by vi.
Best viewed with your eyes
(or your fingers if you are blind).