On Sun, Apr 27, 2008 at 08:29:33PM +0100, Daniel P. Berrange wrote: > After updating the virBuffer APIs to protect against improper usage I have > been thinking about how we might provider safer memory allocation APIs > with protection against common usage errors and compile time validation of > checks for failure. > > The standard C library malloc/realloc/calloc/free APIs have just about the > worst possible design, positively encouraging serious application coding > errors. There are at least 7 common problems I can think of at the moment, > perhaps more.... > > 1. malloc() - pass incorrect number of bytes > 2. malloc() - fail to check the return value > 3. malloc() - use uninitialized memory > 4. free() - double free by not setting pointer to NULL > 5. realloc() - fail to check the return value > 6. realloc() - fail to re-assign the pointer with new address > 7. realloc() - accidental overwrite of original pointer with NULL on > failure, leaking memory > > Many applications / libraries wrap these in their own functions, but typically > fail to address one or more these problems. > > eg glib re-definitions try to address point 2 by having g_malloc() call > abort() on failure, but then re-introduce the problem by adding g_try_malloc() Calling abort() in a library is a major NO-NO and one of the reasons I avoided glib in most of the code I developped. You just can't exit()/abort() from a library. > All 7 of the errors listed early could be avoided and/or checked at compile > time if the APIs used a different calling convention. > > 1. For any given pointer the compiler knows how many bytes need to be > allocated for a single instance of it. ie sizeof(*(ptr)). Since C > has no data type representing a data type, this cannot be done with > a simple function - it needs a (trivial) macro. > > 2. The __attribute__((__warn_unused_result__)) annotation causes the > compiler to warn if an application doesn't do something with a return > value. With the standard malloc() API this doesn't help because you > always assign the return value to a pointer. The core problem here > is that the API encodes the error condition as a special case of > the actual data returned. This can be addressed by separating the > two. The return value should simply be bi-state success or fail code > and the return data passed back via an pointer to a pointer parameter. > > 3. The memory allocated by malloc() is not initialized to zero. For > that a separate calloc() function is provided. This leaves open the > possiblity of an app mistakenly using the wrong variant. Or consider > an app using malloc() and explicitly initializing all struct members. > Some time later a new member is added and now the developer needs to > find all malloc() calls wrt to that struct and explicitly initilize > the new member. It would be safer to always have malloc zero all > memory, even though it has a small performance impact, the net win > in safety is probably worth it. > > 4. Since free() takes the pointer to be freed directly it cannot reset > the caller's original pointer back to NULL. This can be addressed if > a pointer to the pointer is passed instead. The computational cost > of the often redundant assignment to NULL is negligable given the > safety benefit > > 5, 6 & 7. As with problem 2, these problems are all caused by the fact > that the error condition is special case of the return data value. > The impact of these is typically far worse because it often results > in a buffer overflow and the complex rules for calling realloc() are > hard to remember. > > So the core requirements of a safer API for allocation are: > > - Use return values *only* for the success/fail error condition > - Annotate all the return values with __warn_unused_result__ > - Pass a pointer to the pointer into all functions > - Define macros around all functions to automatically do the sizeof(*(ptr)) > > Passing a pointer to a pointer gets a little tedious because of the need > to write '&(foo)' instead of just 'foo', and is actually introduces a > new problem of the caller accidentally passing merely a pointer, instead > of a pointer to a pointer. This is a minor problem though because it will > be identified on the very first run of the code. In addition, since we're > defining macros around every function we can have the macro also convert > from ptr to &(ptr) for us, avoiding this potential error: > > So the primary application usage would be via a set of macros: > > VIR_ALLOC(ptr) > VIR_ALLOC_N(ptr, count) > VIR_REALLOC(ptr) you will need some size here. > VIR_REALLOC_N(ptr, count) > VIR_FREE(ptr) > > These call through to the underlying APIs: > > int virAlloc(void *, size_t bytes) > int virAllocN(void *, size_t bytes, size_t count) > int virRealloc(void *, size_t bytes) > int virReallocN(void *, size_t bytes, size_t count) > int virFree(void *) > > Note that although we're passing a pointer to a pointer into all these, the > first param is still just 'void *' and not 'void **'. This is because 'void *' > is defined to hold any type of pointer, and using 'void **' causes gcc to > complain bitterly about strict aliasing violations. Internally the impls of > these methods will safely cast to 'void **' when deferencing the pointer. One of the problem this introduce is what uses to be a common mistake freeing the a non-pointer object which is normally immediately pointed out by the compiler whicll be lost because your macro will turn this into a void * to the data, and you will get a runtime problem instead. > All 4 of Alloc/Realloc functions will have __warn_unused_result__ annotation > so the caller is forced to check the return value for failure, validated at > compile time generating a warning (or fatal compile error with -Werror). > > So to wire up the macros to the APIs: > > #define VIR_ALLOC(ptr) virAlloc(&(ptr), sizeof(*(ptr))) > #define VIR_ALLOC_N(ptr, count) virAllocN(&(ptr), sizeof(*(ptr)), (count)) > #define VIR_REALLOC(ptr) virRealloc(&(ptr), sizeof(*(ptr))) That i really don't understand. How do you expect to use that realloc without passing a new size. > #define VIR_REALLOC_N(ptr, count) virReallocN(&(ptr), sizeof(*(ptr)), (count)) That I can understand , but the previous one i can't. > #define VIR_FREE(ptr) virFree(&(ptr)) [...] > Much less ugly: > > if (VIR_ALLOC_N(guest->arch.defaultInfo.machines, nmachines) < 0) > return -1; > > if (VIR_REALLOC(migrateTrans, caps->host.nmigrateTrans+1) < 0) > return -1; how does sizeof(*(caps->host.nmigrateTrans+1)) increases the size ? Doesn't make sense to me, you take a pointer, increment it, so basically just pointing to the next element in the array, but the size of the pointed object would be identical and realloc() becomes a noop. The proposal may help clean a lot of things, but VIR_REALLOC I don't understand, what did i missed ? > and produced easier to read code too. The cleaner realloc() API is particularly > appealing. Well ... i don't understand it ! > This does all seem a little bit too good to be true - I'm wondering why other > apps/libraries don't use this style of allocation functions already ? Perhaps > someone can find nasty flaws in this approach, but hopefuly not... I would be tempted to hook into the error reporting directly within memery.c , even if we don't have a good context there, basically if we have a memory shortage even reporting to stderr is sensible , at least once. > +int virAlloc(void *ptrptr, size_t size) > +{ > + if (size == 0) { > + *(void **)ptrptr = NULL; > + return 0; > + } > + > + *(void **)ptrptr = calloc(1, size); > + if (*(void **)ptrptr == NULL) > + return -1; > + return 0; > +} [...] > Index: hash.c > =================================================================== > RCS file: /data/cvs/libvirt/src/hash.c,v > retrieving revision 1.37 > diff -u -p -r1.37 hash.c > --- hash.c 18 Apr 2008 08:33:23 -0000 1.37 > +++ hash.c 27 Apr 2008 19:04:28 -0000 The hash example is interesting, but it doesn't use REALLOC ... Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@xxxxxxxxxx | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list