On Sun, Feb 25, 2024 at 01:10:39PM +0000, Matthew Wilcox wrote: > On Sun, Feb 25, 2024 at 12:18:23AM -0500, Kent Overstreet wrote: > > Before large folios, we had people very much bottlenecked by 4k page > > overhead on sequential IO; my customer/sponsor was one of them. > > > > Factor of 2 or 3, IIRC; it was _bad_. And when you looked at the > > profiles and looked at the filemap.c code it wasn't hard to see why; > > we'd walk a radix tree, do an atomic op (get the page), then do a 4k > > usercopy... hence the work I did to break up > > generic_file_buffered_read() and vectorize it, which was a huge > > improvement. > > There's also the small random 64 byte read case that we haven't optimised > for yet. That also bottlenecks on the page refcount atomic op. > > The proposed solution to that was double-copy; look up the page without > bumping its refcount, copy to a buffer, look up the page again to be > sure it's still there, copy from the buffer to userspace. > > Except that can go wrong under really unlikely circumstances. Look up the > page, page gets freed, page gets reallocated to slab, we copy sensitive > data from it, page gets freed again, page gets reallocated to the same > spot in the file (!), lookup says "yup the same page is there". > We'd need a seqcount or something to be sure the page hasn't moved. yes, generation numbers are the standard solution to ABA...