Re: [PATCH] Add unconditional exit to allocate again

Paul Jackson <pj@xxxxxxx> · Mon, 16 Jun 2008 21:43:09 -0500

Andi wrote:
> In this case it's libnuma internally which doesn't check and even if it
> did couldn't do anything because these calls cannot error out.

Ouch.

>From a quick scan of the libnuma that is in numactl v2.0.1,
it looks to me like libnuma can just exit, without recourse,
in the event of the following errors:

        printf ("map size mismatch; abort\n");
        printf ("request to allocate mask for %d bits; abort\n", n);
        printf ("request to allocate mask for %d bits; abort\n", n);
        printf ("numa_sched_setaffinity_v2_int() failed; abort\n");

        grumble("unparseable node description `%s'\n", s);
        grumble("node argument %d is out of range\n", arg);
        grumble("missing node argument %s\n", s);
        grumble("node argument %d out of range\n", arg2);
        grumble("unparseable cpu description `%s'\n", s);
        grumble("cpu argument %s is out of range\n", s);
        grumble("missing cpu argument %s\n", s);
        grumble("cpu argument %s out of range\n", s);

To my way of thinking, a library that can just exit is unsuitable for
production use in many environments.  For example, I would not want to
link such a library in with a program that might need to perform other
cleanup activities before exiting, if at all possible.  Granted, such
exit issues can be handled, such as with atexit(3).  But the default
should be not to exit.  It's a landmind waiting to blow someone's foot
off.

Production libraries should always return to their caller, as expected
by the API.  The only exceptions would be:
 1) The machine dies - a library is not responsible for such "Acts of God."
 2) The library SEGV's or similar - which is a priority bug.

Part of the problem here, but apparently not all of it, might be that
we have taken an API designed for fixed length bitmasks and converted
it to use with dynamically sized bitmasks.  This perhaps added some
failure cases that the original API has no way to return.

>From the above error messages, if indeed that's about the right list,
that is not all the problem, however.

I honestly don't recall ever seeing before now a library that was
offered for general use in production systems that had so many
unexpected exit paths.

There seems to be an external flag, numa_exit_on_error, which can be
set, and if not set (the default - good) will avoid calling exit in the
numa_error() handler.

But numa_error() only seems to be called for errors in four cases:
failures of the system calls mbind, get_mempolicy, and set_mempolicy,
and this new case of a failed allocation.  This numa_error() routine
is, as Andi has noted, WEAK-ly linked, so the application can replace
it as desired.

I would encourage finding someway to remove these unexpected exits.
One can imagine several ways of doing this.  Perhaps one could:
 1) Establish some per-thread error state.  This might include
	* an errno-like value,
	* a return value,
	* an error string, and
	* a flag indicating whether there was an "unreportable" error.
    By "unreportable" error I mean an error which the API did not support
    passing back inline.
 2) Establish a pluggable routine which will be called just before
    returning.
 3) The default version of that routine might do something like call
    abort(3) [not exit(3)] if there was an unreportable error.

By formalizing the handling of such errors using some specific additions
to the API, we significantly increase the visibility of the error
handling apparatus of libnuma, and thereby make it safer for use in
production environments.

Quite possibly Andi will disagree with me on the above.  To the extent
that Andi and I don't come to quick agreement, I doubt I will spend
much energy advocating for the above.  I have little time for that,
and I doubt Cliff or Andi do either.

What is, is.  What will be, will be.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@xxxxxxx> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-numa" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html