On Di, 16.02.21 13:01, Vito Caputo (vcaputo@xxxxxxxxxxx) wrote: > Hello list, > > I'd like to have a conversation about the robustness of the > ENOMEM-based mmap-cache window reclamation. [0] > > Isn't this arguably a partial implementation, when any other > in-process ENOMEM trigger, outside the purvue of mmap-cache, doesn't > invoke the same reclaim machinery and retry before failing the > operation? > > Shouldn't we, at least within the systemd-journald process, have _all_ > ENOMEM-capable paths be reclaim aware? Though arguably most likely, > there's no guarantee that exhaustion will always occur via > mmap_cache_get(). > > I doubt it would be feasible to achieve anything of this sort on > behalf of external sd-journal consumers, since C doesn't really expose > ways of transparently hooking into the allocator. Maybe glibc > actually has features in this regard I'm unaware of? Even if it does, > there's still syscalls failing on ENOMEM which might succeed on a > retry post-reclaim, no? > > For 100% systemd-internal code, doesn't it seem like we could be doing > more here? As in, maybe providing a global runtime hook for a memory > reclaimer, and having all ENOMEM return paths wrapped in a reclaim and > retry attempt before propagating out the often fatal ENOMEM error? ENOMEM on mmap() usually means that address space is exhausted or that the max number of mappings per process is reached. Which should normally be really far out, at least on 64bit systems. I remember I added this back in the day when systems were still 32bit frequently, because we actually hit the address space exhaustion a bit too quickly IRL if you had a larger number of journals on a 32bit system, because we created a bunch of mappings for each one of them. i.e. you only had 2G of address space and we used larger mappings than today, so you reached the end very quickly. In that case it was our own code which caused this by chomping up address space, hence it made a ton of sense to have a fallback path in our code that tried to do something about this by releasing some mappings. Of course, the journal code is not the only mmap() user, malloc() is too. But they use it much less frequently, i.e. malloc() will only use it for large allocations typically, and for smaller ones large chunks of memory are acquired at a time, which can then be handed out piecemeal without any syscalls. Moreover, the journal using tools (journald, journalctl) were mosty single threaded, so we knew for sure if our own code didn't do any malloc()s, then noone did in our own process while we tried to do mmap(). I think the loop is not as relevant as it used to be. on 64bit the limits on address space are basically gone, and we enforce limits on open files and similar stuff much more aggressively, so that our mmaps are usually freed long before they could become a nuisance. This fallback path is probably not relevant on todays systems anymore. And one can say: while it might not bring much benefit in all cases, it might in some, and it's easy to do, so why not leave it in? or to turn this around: glibc doesn't have any hooks to my knowledge that would allow us to release some mmaps if glibc detects address space exhaustion. if it had, we could certainly hook things up with that… Does that make any sense? Lennart -- Lennart Poettering, Berlin _______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel