Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Sun, 25 Feb 2024 12:32:36 -0500

On Sun, Feb 25, 2024 at 01:10:39PM +0000, Matthew Wilcox wrote:
> On Sun, Feb 25, 2024 at 12:18:23AM -0500, Kent Overstreet wrote:
> > Before large folios, we had people very much bottlenecked by 4k page
> > overhead on sequential IO; my customer/sponsor was one of them.
> > 
> > Factor of 2 or 3, IIRC; it was _bad_. And when you looked at the
> > profiles and looked at the filemap.c code it wasn't hard to see why;
> > we'd walk a radix tree, do an atomic op (get the page), then do a 4k
> > usercopy... hence the work I did to break up
> > generic_file_buffered_read() and vectorize it, which was a huge
> > improvement.
> 
> There's also the small random 64 byte read case that we haven't optimised
> for yet.  That also bottlenecks on the page refcount atomic op.
> 
> The proposed solution to that was double-copy; look up the page without
> bumping its refcount, copy to a buffer, look up the page again to be
> sure it's still there, copy from the buffer to userspace.
> 
> Except that can go wrong under really unlikely circumstances.  Look up the
> page, page gets freed, page gets reallocated to slab, we copy sensitive
> data from it, page gets freed again, page gets reallocated to the same
> spot in the file (!), lookup says "yup the same page is there".
> We'd need a seqcount or something to be sure the page hasn't moved.

yes, generation numbers are the standard solution to ABA...