Re: (Dropping) OOM Handling in libvirt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/13/19 12:17 PM, Daniel P. Berrangé wrote:
This is a long mail about ENOMEM (OOM) handling in libvirt. The executive
summary is that it is not worth the maint cost because:

   - this code will almost never run on Linux hosts
- if it does run it will likely have bad behaviour silently dropping
     data or crashing the process

   - apps using libvirt often do so via a non-C language that aborts/exits
     the app on OOM regardless, or use other C libraries that abort

   - we can build a system that is more reliable when OOM happens by
     not catching OOM, instead simply letting apps exit, restart and
     carry on where they left off

The long answer follows...

I'm up for dropping OOM handling. Since we're linking with a lot of libraries I don't think we can be confident that we won't abort() even now anyway.



The background
==============

Since the first commit libvirt has attempted to handle out of memory (OOM)
errors in the same way it does for any other problem. Namely, a virError will
be raised, and the method will jump to its cleanup/error label. The error
bubbles up the stack and in theory someone or something will catch this and do
something sensible/useful. This has long been considered "best practice" for
most C libraries, especially those used for system services. This mail makes
the argument that it is in fact /not/ "best practice" to try to handle OOM.

OOM handling is very difficult to get correct because it is something that
developers almost never encounter and thus the code paths are rarely run. We
designed our memory allocation APIs such that we get compile time errors if
code forgets to check the return value for failure. This is good as it
eliminates an entire class of bugs. Our error handling goto label pattern
tries to align OOM handling with other general error handling which is more
commonly tested.

We have code in the allocators which lets us run unit tests simulating the
failure of any allocation that is made during the test. Executing this is
extraordinarily time consuming as some of our unit tests have many 1000's or
even 10's of 1000's of allocations. The running time to simulate OOM for each
allocation is O(N^2) which does not scale well. As a result we've probably
only run these OOM tests a handful of times over the years.

Well, I've tried this and so far, the only problems I've found were in tests where we're ignoring revals of VIR_ALLOC() or its friends (e.g. vir*New()) or we're checking it after it's used already.
BTW: to make it work I had to disable failing from mocks (obviously).


The tests show we generally do remarkly well at OOM handling, but none the
less we have *always* found bugs in our handling where we fail to report the
OOM (as in, continue to run but with missing data), or we crash when cleaning
up from OOM. Our unit tests don't cover anywhere near all of our codebase
though. We have great coverage of XML parsing/formatting, and sporadic coverage
of everything else.

IOW, despite our clever API designs & coding patterns designed to improve our
OOM handling, we can not have confidence that we will actually operate correctly
in an OOM scenario.

I beg to disagree. The fact that our cleanup paths are in 99% the same as error paths means that if there are no mem-leaks (and if transformation to VIR_AUTOFREE is done the chances are low), then we propably do good on OOM too.

<snip/>

The implementation
==================

Assuming a decision to abort on OOM, libvirt can nwo follow QEMU's lead and
make use of the GLib library.

No, please no. Firstly, glib is a new C dialect that only a few of us speak. Secondly, glib adds some padding around its objects => libvirt's memory footprint will grow. Thirdly, it's yet another library to link with (on my system libvirt links with 53 libraries already).

Michal

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux