Re: [PATCH 5/5] selftests/bpf: a simple benchmark tool for /proc/<pid>/maps APIs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



.. Adding Suren & Willy to the Cc

* Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> [240504 18:14]:
> On Sat, May 4, 2024 at 8:32 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, May 03, 2024 at 05:30:06PM -0700, Andrii Nakryiko wrote:
> > > I also did an strace run of both cases. In text-based one the tool did
> > > 68 read() syscalls, fetching up to 4KB of data in one go.
> >
> > Why not fetch more at once?
> >
> 
> I didn't expect to be interrogated so much on the performance of the
> text parsing front, sorry. :) You can probably tune this, but where is
> the reasonable limit? 64KB? 256KB? 1MB? See below for some more
> production numbers.

The reason the file reads are limited to 4KB is because this file is
used for monitoring processes.  We have a significant number of
organisations polling this file so frequently that the mmap lock
contention becomes an issue. (reading a file is free, right?)  People
also tend to try to figure out why a process is slow by reading this
file - which amplifies the lock contention.

What happens today is that the lock is yielded after 4KB to allow time
for mmap writes to happen.  This also means your data may be
inconsistent from one 4KB block to the next (the write may be around
this boundary).

This new interface also takes the lock in do_procmap_query() and does
the 4kb blocks as well.  Extending this size means more time spent
blocking mmap writes, but a more consistent view of the world (less
"tearing" of the addresses).

We are working to reduce these issues by switching the /proc/<pid>/maps
file to use rcu lookup.  I would recommend we do not proceed with this
interface using the old method and instead, implement it using rcu from
the start - if it fits your use case (or we can make it fit your use
case).

At least, for most page faults, we can work around the lock contention
(since v6.6), but not all and not on all archs.

...

> 
> > > In comparison,
> > > ioctl-based implementation had to do only 6 ioctl() calls to fetch all
> > > relevant VMAs.
> > >
> > > It is projected that savings from processing big production applications
> > > would only widen the gap in favor of binary-based querying ioctl API, as
> > > bigger applications will tend to have even more non-executable VMA
> > > mappings relative to executable ones.
> >
> > Define "bigger applications" please.  Is this some "large database
> > company workload" type of thing, or something else?
> 
> I don't have a definition. But I had in mind, as one example, an
> ads-serving service we use internally (it's a pretty large application
> by pretty much any metric you can come up with). I just randomly
> picked one of the production hosts, found one instance of that
> service, and looked at its /proc/<pid>/maps file. Hopefully it will
> satisfy your need for specifics.
> 
> # cat /proc/1126243/maps | wc -c
> 1570178
> # cat /proc/1126243/maps | wc -l
> 28875
> # cat /proc/1126243/maps | grep ' ..x. ' | wc -l
> 7347

We have distributions increasing the map_count to an insane number to
allow games to work [1].  It is, unfortunately, only a matter of time until
this is regularly an issue as it is being normalised and allowed by an
increased number of distributions (fedora, arch, ubuntu).  So, despite
my email address, I am not talking about large database companies here.

Also, note that applications that use guard VMAs double the number for
the guards.  Fun stuff.

We are really doing a lot in the VMA area to reduce the mmap locking
contention and it seems you have a use case for a new interface that can
leverage these changes.

We have at least two talks around this area at LSF if you are attending.

Thanks,
Liam

[1] https://lore.kernel.org/linux-mm/8f6e2d69-b4df-45f3-aed4-5190966e2dea@xxxxxxxxxxxxxxxxx/






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux