On Tue, May 7, 2024 at 8:49 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: > > .. Adding Suren & Willy to the Cc > > * Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> [240504 18:14]: > > On Sat, May 4, 2024 at 8:32 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Fri, May 03, 2024 at 05:30:06PM -0700, Andrii Nakryiko wrote: > > > > I also did an strace run of both cases. In text-based one the tool did > > > > 68 read() syscalls, fetching up to 4KB of data in one go. > > > > > > Why not fetch more at once? > > > > > > > I didn't expect to be interrogated so much on the performance of the > > text parsing front, sorry. :) You can probably tune this, but where is > > the reasonable limit? 64KB? 256KB? 1MB? See below for some more > > production numbers. > > The reason the file reads are limited to 4KB is because this file is > used for monitoring processes. We have a significant number of > organisations polling this file so frequently that the mmap lock > contention becomes an issue. (reading a file is free, right?) People > also tend to try to figure out why a process is slow by reading this > file - which amplifies the lock contention. > > What happens today is that the lock is yielded after 4KB to allow time > for mmap writes to happen. This also means your data may be > inconsistent from one 4KB block to the next (the write may be around > this boundary). > > This new interface also takes the lock in do_procmap_query() and does > the 4kb blocks as well. Extending this size means more time spent > blocking mmap writes, but a more consistent view of the world (less > "tearing" of the addresses). Hold on. There is no 4KB in the new ioctl-based API I'm adding. It does a single VMA look up (presumably O(logN) operation) using a single vma_iter_init(addr) + vma_next() call on vma_iterator. As for the mmap_read_lock_killable() (is that what we are talking about?), I'm happy to use anything else available, please give me a pointer. But I suspect given how fast and small this new API is, mmap_read_lock_killable() in it is not comparable to holding it for producing /proc/<pid>/maps contents. > > We are working to reduce these issues by switching the /proc/<pid>/maps > file to use rcu lookup. I would recommend we do not proceed with this > interface using the old method and instead, implement it using rcu from > the start - if it fits your use case (or we can make it fit your use > case). > > At least, for most page faults, we can work around the lock contention > (since v6.6), but not all and not on all archs. > > ... > > > > > > > In comparison, > > > > ioctl-based implementation had to do only 6 ioctl() calls to fetch all > > > > relevant VMAs. > > > > > > > > It is projected that savings from processing big production applications > > > > would only widen the gap in favor of binary-based querying ioctl API, as > > > > bigger applications will tend to have even more non-executable VMA > > > > mappings relative to executable ones. > > > > > > Define "bigger applications" please. Is this some "large database > > > company workload" type of thing, or something else? > > > > I don't have a definition. But I had in mind, as one example, an > > ads-serving service we use internally (it's a pretty large application > > by pretty much any metric you can come up with). I just randomly > > picked one of the production hosts, found one instance of that > > service, and looked at its /proc/<pid>/maps file. Hopefully it will > > satisfy your need for specifics. > > > > # cat /proc/1126243/maps | wc -c > > 1570178 > > # cat /proc/1126243/maps | wc -l > > 28875 > > # cat /proc/1126243/maps | grep ' ..x. ' | wc -l > > 7347 > > We have distributions increasing the map_count to an insane number to > allow games to work [1]. It is, unfortunately, only a matter of time until > this is regularly an issue as it is being normalised and allowed by an > increased number of distributions (fedora, arch, ubuntu). So, despite > my email address, I am not talking about large database companies here. > > Also, note that applications that use guard VMAs double the number for > the guards. Fun stuff. > > We are really doing a lot in the VMA area to reduce the mmap locking > contention and it seems you have a use case for a new interface that can > leverage these changes. > > We have at least two talks around this area at LSF if you are attending. I am attending LSFMM, yes, I'll try to not miss them. > > Thanks, > Liam > > [1] https://lore.kernel.org/linux-mm/8f6e2d69-b4df-45f3-aed4-5190966e2dea@xxxxxxxxxxxxxxxxx/ >