.. Adding Suren & Willy to the Cc * Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> [240504 18:14]: > On Sat, May 4, 2024 at 8:32 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > On Fri, May 03, 2024 at 05:30:06PM -0700, Andrii Nakryiko wrote: > > > I also did an strace run of both cases. In text-based one the tool did > > > 68 read() syscalls, fetching up to 4KB of data in one go. > > > > Why not fetch more at once? > > > > I didn't expect to be interrogated so much on the performance of the > text parsing front, sorry. :) You can probably tune this, but where is > the reasonable limit? 64KB? 256KB? 1MB? See below for some more > production numbers. The reason the file reads are limited to 4KB is because this file is used for monitoring processes. We have a significant number of organisations polling this file so frequently that the mmap lock contention becomes an issue. (reading a file is free, right?) People also tend to try to figure out why a process is slow by reading this file - which amplifies the lock contention. What happens today is that the lock is yielded after 4KB to allow time for mmap writes to happen. This also means your data may be inconsistent from one 4KB block to the next (the write may be around this boundary). This new interface also takes the lock in do_procmap_query() and does the 4kb blocks as well. Extending this size means more time spent blocking mmap writes, but a more consistent view of the world (less "tearing" of the addresses). We are working to reduce these issues by switching the /proc/<pid>/maps file to use rcu lookup. I would recommend we do not proceed with this interface using the old method and instead, implement it using rcu from the start - if it fits your use case (or we can make it fit your use case). At least, for most page faults, we can work around the lock contention (since v6.6), but not all and not on all archs. ... > > > > In comparison, > > > ioctl-based implementation had to do only 6 ioctl() calls to fetch all > > > relevant VMAs. > > > > > > It is projected that savings from processing big production applications > > > would only widen the gap in favor of binary-based querying ioctl API, as > > > bigger applications will tend to have even more non-executable VMA > > > mappings relative to executable ones. > > > > Define "bigger applications" please. Is this some "large database > > company workload" type of thing, or something else? > > I don't have a definition. But I had in mind, as one example, an > ads-serving service we use internally (it's a pretty large application > by pretty much any metric you can come up with). I just randomly > picked one of the production hosts, found one instance of that > service, and looked at its /proc/<pid>/maps file. Hopefully it will > satisfy your need for specifics. > > # cat /proc/1126243/maps | wc -c > 1570178 > # cat /proc/1126243/maps | wc -l > 28875 > # cat /proc/1126243/maps | grep ' ..x. ' | wc -l > 7347 We have distributions increasing the map_count to an insane number to allow games to work [1]. It is, unfortunately, only a matter of time until this is regularly an issue as it is being normalised and allowed by an increased number of distributions (fedora, arch, ubuntu). So, despite my email address, I am not talking about large database companies here. Also, note that applications that use guard VMAs double the number for the guards. Fun stuff. We are really doing a lot in the VMA area to reduce the mmap locking contention and it seems you have a use case for a new interface that can leverage these changes. We have at least two talks around this area at LSF if you are attending. Thanks, Liam [1] https://lore.kernel.org/linux-mm/8f6e2d69-b4df-45f3-aed4-5190966e2dea@xxxxxxxxxxxxxxxxx/