Re: [RFC] systemd-journald, mmap-cache, and its ENOMEM-driven window reclamation

Lennart Poettering <lennart@xxxxxxxxxxxxxx> · Tue, 16 Feb 2021 23:31:28 +0100

On Di, 16.02.21 13:01, Vito Caputo (vcaputo@xxxxxxxxxxx) wrote:

> Hello list,
>
> I'd like to have a conversation about the robustness of the
> ENOMEM-based mmap-cache window reclamation. [0]
>
> Isn't this arguably a partial implementation, when any other
> in-process ENOMEM trigger, outside the purvue of mmap-cache, doesn't
> invoke the same reclaim machinery and retry before failing the
> operation?
>
> Shouldn't we, at least within the systemd-journald process, have _all_
> ENOMEM-capable paths be reclaim aware?  Though arguably most likely,
> there's no guarantee that exhaustion will always occur via
> mmap_cache_get().
>
> I doubt it would be feasible to achieve anything of this sort on
> behalf of external sd-journal consumers, since C doesn't really expose
> ways of transparently hooking into the allocator.  Maybe glibc
> actually has features in this regard I'm unaware of?  Even if it does,
> there's still syscalls failing on ENOMEM which might succeed on a
> retry post-reclaim, no?
>
> For 100% systemd-internal code, doesn't it seem like we could be doing
> more here?  As in, maybe providing a global runtime hook for a memory
> reclaimer, and having all ENOMEM return paths wrapped in a reclaim and
> retry attempt before propagating out the often fatal ENOMEM error?

ENOMEM on mmap() usually means that address space is exhausted or that
the max number of mappings per process is reached. Which should
normally be really far out, at least on 64bit systems.

I remember I added this back in the day when systems were still 32bit
frequently, because we actually hit the address space exhaustion a bit
too quickly IRL if you had a larger number of journals on a 32bit
system, because we created a bunch of mappings for each one of
them. i.e. you only had 2G of address space and we used larger
mappings than today, so you reached the end very quickly.

In that case it was our own code which caused this by chomping up
address space, hence it made a ton of sense to have a fallback path in
our code that tried to do something about this by releasing some
mappings.

Of course, the journal code is not the only mmap() user, malloc() is
too. But they use it much less frequently, i.e. malloc() will only use
it for large allocations typically, and for smaller ones large chunks
of memory are acquired at a time, which can then be handed out
piecemeal without any syscalls. Moreover, the journal using tools
(journald, journalctl) were mosty single threaded, so we knew for sure
if our own code didn't do any malloc()s, then noone did in our own
process while we tried to do mmap().

I think the loop is not as relevant as it used to be. on 64bit the
limits on address space are basically gone, and we enforce limits on
open files and similar stuff much more aggressively, so that our mmaps
are usually freed long before they could become a nuisance. This
fallback path is probably not relevant on todays systems anymore. And
one can say: while it might not bring much benefit in all cases, it
might in some, and it's easy to do, so why not leave it in?

or to turn this around: glibc doesn't have any hooks to my knowledge
that would allow us to release some mmaps if glibc detects address
space exhaustion. if it had, we could certainly hook things up with
that…

Does that make any sense?

Lennart

--
Lennart Poettering, Berlin
_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel