On Sun, Feb 25, 2024 at 12:18:23AM -0500, Kent Overstreet wrote: > Before large folios, we had people very much bottlenecked by 4k page > overhead on sequential IO; my customer/sponsor was one of them. > > Factor of 2 or 3, IIRC; it was _bad_. And when you looked at the > profiles and looked at the filemap.c code it wasn't hard to see why; > we'd walk a radix tree, do an atomic op (get the page), then do a 4k > usercopy... hence the work I did to break up > generic_file_buffered_read() and vectorize it, which was a huge > improvement. There's also the small random 64 byte read case that we haven't optimised for yet. That also bottlenecks on the page refcount atomic op. The proposed solution to that was double-copy; look up the page without bumping its refcount, copy to a buffer, look up the page again to be sure it's still there, copy from the buffer to userspace. Except that can go wrong under really unlikely circumstances. Look up the page, page gets freed, page gets reallocated to slab, we copy sensitive data from it, page gets freed again, page gets reallocated to the same spot in the file (!), lookup says "yup the same page is there". We'd need a seqcount or something to be sure the page hasn't moved.