Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+cc Paul

On Mon, Feb 26, 2024 at 04:17:19PM -0500, Kent Overstreet wrote:
> On Mon, Feb 26, 2024 at 09:07:51PM +0000, Matthew Wilcox wrote:
> > On Mon, Feb 26, 2024 at 09:17:33AM -0800, Linus Torvalds wrote:
> > > Willy - tangential side note: I looked closer at the issue that you
> > > reported (indirectly) with the small reads during heavy write
> > > activity.
> > > 
> > > Our _reading_ side is very optimized and has none of the write-side
> > > oddities that I can see, and we just have
> > > 
> > >   filemap_read ->
> > >     filemap_get_pages ->
> > >         filemap_get_read_batch ->
> > >           folio_try_get_rcu()
> > > 
> > > and there is no page locking or other locking involved (assuming the
> > > page is cached and marked uptodate etc, of course).
> > > 
> > > So afaik, it really is just that *one* atomic access (and the matching
> > > page ref decrement afterwards).
> > 
> > Yep, that was what the customer reported on their ancient kernel, and
> > we at least didn't make that worse ...
> > 
> > > We could easily do all of this without getting any ref to the page at
> > > all if we did the page cache release with RCU (and the user copy with
> > > "copy_to_user_atomic()").  Honestly, anything else looks like a
> > > complete disaster. For tiny reads, a temporary buffer sounds ok, but
> > > really *only* for tiny reads where we could have that buffer on the
> > > stack.
> > > 
> > > Are tiny reads (handwaving: 100 bytes or less) really worth optimizing
> > > for to that degree?
> > > 
> > > In contrast, the RCU-delaying of the page cache might be a good idea
> > > in general. We've had other situations where that would have been
> > > nice. The main worry would be low-memory situations, I suspect.
> > > 
> > > The "tiny read" optimization smells like a benchmark thing to me. Even
> > > with the cacheline possibly bouncing, the system call overhead for
> > > tiny reads (particularly with all the mitigations) should be orders of
> > > magnitude higher than two atomic accesses.
> > 
> > Ah, good point about the $%^&^*^ mitigations.  This was pre mitigations.
> > I suspect that this customer would simply disable them; afaik the machine
> > is an appliance and one interacts with it purely by sending transactions
> > to it (it's not even an SQL system, much less a "run arbitrary javascript"
> > kind of system).  But that makes it even more special case, inapplicable
> > to the majority of workloads and closer to smelling like a benchmark.
> > 
> > I've thought about and rejected RCU delaying of the page cache in the
> > past.  With the majority of memory in anon memory & file memory, it just
> > feels too risky to have so much memory waiting to be reused.  We could
> > also improve gup-fast if we could rely on RCU freeing of anon memory.
> > Not sure what workloads might benefit from that, though.
> 
> RCU allocating and freeing of memory can already be fairly significant
> depending on workload, and I'd expect that to grow - we really just need
> a way for reclaim to kick RCU when needed (and probably add a percpu
> counter for "amount of memory stranded until the next RCU grace
> period").




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux