On Tue, Oct 22, 2019 at 10:51:49AM +0000, Zbigniew Jędrzejewski-Szmek wrote: > On Tue, Oct 22, 2019 at 12:34:45PM +0200, Umut Tezduyar Lindskog wrote: > > I am curious Zbigniew of how you find out if the coredump was on a starved > > process? > > A very common case is systemd-journald which gets SIGABRT when in a > read() or write() or similar syscall. Another case is when > systemd-udevd workers get ABRT when doing open() on a device. > In the case of journald, is it really in read()/write() syscalls you're seeing the SIGABRTs? When I worked on making journald offlining asynchronous, back then the pimary source of problems was very slow fsync() calls. But we've now moved those into a thread, so that particular source of aborts should be removed. I'd be a little surprised to hear it's now aborting in read()/write() calls, because journald's storage IO is all done through mmap() windows. At the time I was working on it, something I latently wanted to explore was converting journald to a more database-ish Direct-IO + AIO storage engine. Partially to eliminate the nondeterministic, potentially lengthy stalls a thrashing or otherwise unresponsive backing store could cause when journald faulted on an affected page in its mappings. As long as journald is mmap()-based, it will always be suceptible to long, uncontrolled delays blocking the event loop, as it uses a single process for both its event loop and structure modify operations on those mappings. If it used AIO the IOs would all be submitted asynchronously and completed as part of the same event loop responding to watchdog timeouts. So it'd be far less likely to starve servicing the event sources, including the watchdog timer, since that'd be its principal blocking point. I realize these details are a bit orthogonal to the watchdog, but it might help appreciate the extent of non-determinism causing such false positives. Programs like journald have not been structured carefully to minimize event loop delays the watchdog perceives as hangs. Of course, if the journald process itself began faulting in and out every time it was scheduled on a thrashing system, it could still become delayed enough for an aggressive watchdog to fire. It would be a lot easier to pin its pages in memory and insulate it from those situations if its working set were a relatively small, finite set used for dispatching an upper-bound of concurrent async IO, just mlock most of what it ever needs early on. But we're nowhere near that architecture today, and that's really the kind of approach needed to make things watchdog-appropriate with a minimum of false positives. Regards, Vito Caputo _______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel