Mark_malloc

I ask the reader to excuse my bad english, this is not a natural language to me. (thanks fly to Mike Schiraldi for some corrections in this file)

Introduction

int average(FILE *f)
{
  int *value;
  int nb_numbers;
  int i;
  int avg = 0;

  fread(&nb_numbers, sizeof(int), 1, f);

  value = malloc(nb_numbers * sizeof(int));

  fread(value, sizeof(int), nb_numbers, f);

  for (i = 0; i < nb_numbers; i++)
    avg += value[i];

  return avg / nb_numbers;
}

Let's forget the endianness, checks about read/malloc returns and correctness of nb_numbers usage.
What do we have here? Obviously, the malloc is never freed, so if the function average is used all around the program, we're in trouble. This is a memory leak, meaning some allocated memory is never freed explicitly by the program.
Here it is obvious, and a simple free added before the function's return is the solution. But what about more complicated memory management?
Most of the time, we have some structures in our programs whose life duration which we don't always know. Sometimes, a mallocated buffer can change its meaning. For example, we can have a very sophisticated allocation policy, for speed reasons, where we want to avoid as much as possible calls to malloc and free and so reuse useless buffers for absolutly totally different purposes, or we may have a list of free buffers so that we don't free them all the time, but put them in this list instead, for later usage.
Bref, our coding practice leads us to errors in our malloc/free handling, even if we are experienced programmers.

Memory leak detection

To avoid memory leaks, we have several options.

Code perfectly, with no error. This is nonsense, we all make errors, and malloc/free errors are some of the most elusive (distributed programming is even harder, but that's off-topic).
Do a static analysis of the program. This means to inspect the code line by line, watching the mallocs and frees that are done, and deciding, with all the code in mind, if those mallocs and frees are correct.
Due to some limitations on our brains, this solution can't be applied directly this way. Several thousands lines of code can't be remembered by a human being. Even a hundred of lines of code is hard to fully remember, if not impossible.
So what? The answer could be to reason about programs, abstract unnecessary constructs, thus resulting in a smaller and smaller program, still correct to check the malloc/free handling. Well, this is a possible direction. I am not aware of practical results (i.e. tools that can be used to do this) in this area.
Another way is to add some semantic information about the pointers used in our programs, thus helping a static analyzer in finding bugs (see lclint for a good example).
This is a good idea, but not sufficient enough. We would like to have a fully automatic tool. We would like to run:
```
check my_program.c
```
and have an easy to understand answer like:
```
line 10, this malloc is never freed
```
or something like that (it should work with several input files).
Currently speaking (October 2001), no such tool exists. And we still need something to help us in our coding! And we need it today, not in ten or twenty years. So what now?
Have some coding practice. A solution could be to impose upon us the use of some coding practice. For example, each time a pointer becomes useless somewhere in our code, free it systematically, even if we will do a malloc of the same size in the next line (this is the extreme case, as you can imagine, but you get the point). We could totally avoid the use of free buffers. We could always copy the parameters we get from another module, using private structures, which are much easier to trace. We could do this, we could avoid that... This is endless.
The question that arises here is: what is a good practice? If one can answer that, in a formal manner preferably, it would help most of us in our daily troubles.
And another question is: what about performance? When we take it into account, some hacks are necessary to speed up the code, and they very well may break our "good practice".
Do some runtime tests. A runtime test is to trace, at runtime, the value of some variables of the program, and see if they are correct in respect to what we expect. We can check parameters of functions, calls to certain functions, and (what is done into mark_malloc) the malloc/free stuff.
This is a working solution. It is far from perfect of course, for the main reason that it depends on inputs we provide to the program, and most of the time, there is an infinite number of those inputs, so we don't catch every case, so we can't say something fully reliable based on this method only.
But it works. :-)
See the Related work section below for some pointers to other tools that exist, in this context.
This approach is the same that led to the creation of our beloved debuggers, which should be useless if we coded well. But we don't, and I don't think I take a big risk in saying we never will. So, this is a not-so-bad solution after all.

Mark_malloc

Mark_malloc is my proposition in the direction of dynamic inspection of malloc/free bad usage.

It marks mallocated buffers and detects, to the end of the program, the ones that are not freed. If it finds some that are not, it will display the calling sequence of functions that are responsible for this error.

It is written in C and works under unix (see here to get the list of supported systems).

It interfaces well with C programs, and maybe others that I didn't try. (I would be pleased to know if it can help C++ coders in their coding. Unfortunately, I don't speak C++, so I can't try with a C++ program.)

You can read the detailed documentation to get more information about it.

It is released in the public domain, even if I ripped some code from the gnu binutils package.
This is a total lake of respect of the GPL, but I don't like licences, even copylefted ones. You can disagree with my views (I guess you do in fact), but well, I won't change. Freedom wants no licence. By the way, I wrote this to explain my opinion. And yes, I know this is illegal too. (Anyway, this code is so short and so few people will check it and much fewer will use it, that it's not a big problem.)
And, if there was a good documentation about libbfd and how to use it, I wouldn't have ripped code from addr2line. So, I can say it is bad design from the binutils team that led to this situation. I know it's hard to write good documentation, but it is vital.

Download
- mark-malloc-2.0.1.tar.gz (2002, May 31st).
  Orlando Bassotto did all this. Now you have a script markm to use mark_malloc very nicely, and autoconf/automake to install it friendly too. Many ports exist too. Contact Orlando for more infos or hosts wanted. The documentation has not been updated, but remains almost the same. No time to update, ask Orlando for any problems (mail below).
- mark_malloc-1.0.2.tar.gz (2002, March 14th).
  Orlando Bassotto did a port under PowerPC and made some changes here and there. He added the use of '_' under bash 2.x so you don't need to declare MARK_MALLOC_PROGRAM_FILE anymore under this shell. He added MARK_MALLOC_HEXDUMP_CONTENT to dump the content of the allocated unfreed buffers (this can be very huge if your code is very very unclean). He fixed some code here and there.
- mark_malloc-1.0.1.tar.gz (2001, October 9th).
  Mike Schiraldi corrected a bit the grammar/spelling of this web page and proposed some fixes to avoid warnings at compile time. Loīc Lefort proposed the use of __attribute__((__noreturn__)) too.
- mark_malloc-1.0.0.tar.gz (2001, October 3rd).
Detailed documentation Go to the documentation to get a deep view onto mark_malloc, a small tutorial, and how to use it into your development process.

Related work

Mark_malloc is far from perfect. I find it useful, that's the reason why I give it away publicly.

Some other tools exist, to help you in your coding.

I put this file on my site which comes from there with lots of links.
In particular, see electric fence, which is very useful in detecting bad usage of buffers (when you do out of bounds access).
mpr is very similar to mark_malloc too, but provides much more information. Well, check the page to see the rest.

To finish, a little word about lclint. Lclint is a static analyzer. It uses annotations to give some sematic information to the checker. It is nice even if I prefer totally automatic methods (but I don't know if it is possible in practice; I know some undecidability results (if you want the paper, contact me) (maybe this link works, maybe not), but when I see how the proof has been done, I won't say it's impossible in our common coding styles).

There is a huge research field in this area. Where could you start to get some information? Well, the ResearchIndex is a good start, with a keyword based search engine. The ACM's digital library is very good too, but maybe you won't be able to access everything if you are not inside a university (I do have full access to the digital library). And your favorite search engine can be used too, I guess. Try "memory leaks detection", "malloc errors" and the like.

Contact: sed@free.fr
Creation time: around 2002 probably. Last update: Wed, 09 Mar 2005 13:47:05 +0100