Hi, On 2025-01-27 17:09:57 +0100, David Hildenbrand wrote: > > Andres shared some gists, but I don't want to send those to a > > mailing list without permission. Here's the kernel part of the > > perf report: > > > > 14.04% postgres [kernel.kallsyms] [k] try_grab_folio_fast > > | > > --14.04%--try_grab_folio_fast > > gup_fast_fallback > > | > > --13.85%--iov_iter_extract_pages > > bio_iov_iter_get_pages > > iomap_dio_bio_iter > > __iomap_dio_rw > > iomap_dio_rw > > xfs_file_dio_read > > xfs_file_read_iter > > __io_read > > io_read > > io_issue_sqe > > io_submit_sqes > > __do_sys_io_uring_enter > > do_syscall_64 > > > > Now, since postgres is using io_uring, perhaps there could be a path > > which registers the memory with the iouring (doing the refcount/pincount > > dance once), and then use that pinned memory for each I/O. Maybe that > > already exists; I'm not keeping up with io_uring development and I can't > > seem to find any documentation on what things like io_provide_buffers() > > actually do. Worth noting that we'll not always use io_uring. Partially for portability to other platforms, partially because it turns out that io_uring is disabled in enough environments that we can't rely on it. The generic fallback implementation is a pool of worker processes connected via shared memory. The worker process approach did run into this issue, fwiw. That's not to say that a legit answer to this scalability issue can't be "use fixed bufs with io_uring", just wanted to give context. > That's precisely what io-uring fixed buffers do :) I looked at using them at some point - unfortunately it seems that there is just {READ,WRITE}_FIXED not {READV,WRITEV}_FIXED. It's *exceedingly* common for us to do reads/writes where source/target buffers aren't wholly contiguous. Thus - unless I am misunderstanding something, entirely plausible - using fixed buffers would unfortunately increase the number of IOs noticeably. Should have sent an email about that... I guess we could add some heuristic to use _FIXED if it doesn't require splitting an IO into too many sub-ios. But that seems pretty gnarly. I dimly recall that I also ran into some around using fixed buffers as a non-root user. It might just be the accounting of registered buffers as mlocked memory and the difficulty of configuring that across distributions. But I unfortunately don't remember any details anymore. Greetings, Andres Freund