Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 25, 2024 at 01:10:39PM +0000, Matthew Wilcox wrote:
> On Sun, Feb 25, 2024 at 12:18:23AM -0500, Kent Overstreet wrote:
> > Before large folios, we had people very much bottlenecked by 4k page
> > overhead on sequential IO; my customer/sponsor was one of them.
> > 
> > Factor of 2 or 3, IIRC; it was _bad_. And when you looked at the
> > profiles and looked at the filemap.c code it wasn't hard to see why;
> > we'd walk a radix tree, do an atomic op (get the page), then do a 4k
> > usercopy... hence the work I did to break up
> > generic_file_buffered_read() and vectorize it, which was a huge
> > improvement.
> 
> There's also the small random 64 byte read case that we haven't optimised
> for yet.  That also bottlenecks on the page refcount atomic op.
> 
> The proposed solution to that was double-copy; look up the page without
> bumping its refcount, copy to a buffer, look up the page again to be
> sure it's still there, copy from the buffer to userspace.
> 
> Except that can go wrong under really unlikely circumstances.  Look up the
> page, page gets freed, page gets reallocated to slab, we copy sensitive
> data from it, page gets freed again, page gets reallocated to the same
> spot in the file (!), lookup says "yup the same page is there".
> We'd need a seqcount or something to be sure the page hasn't moved.

yes, generation numbers are the standard solution to ABA...




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux