If the workload doing a lot of single-page try_grab_folio_fast(), could it
do so on a larger area (multiple pages at once -> single refcount update)?
Not really. This is memory that's being used as the buffer cache, so
every thread in your database is hammering on it and pulling in exactly
the data that it needs for the SQL query that it's processing.
Ouch.
Maybe there is a link to the report you could share, thanks.
Andres shared some gists, but I don't want to send those to a
mailing list without permission. Here's the kernel part of the
perf report:
14.04% postgres [kernel.kallsyms] [k] try_grab_folio_fast
|
--14.04%--try_grab_folio_fast
gup_fast_fallback
|
--13.85%--iov_iter_extract_pages
bio_iov_iter_get_pages
iomap_dio_bio_iter
__iomap_dio_rw
iomap_dio_rw
xfs_file_dio_read
xfs_file_read_iter
__io_read
io_read
io_issue_sqe
io_submit_sqes
__do_sys_io_uring_enter
do_syscall_64
Now, since postgres is using io_uring, perhaps there could be a path
which registers the memory with the iouring (doing the refcount/pincount
dance once), and then use that pinned memory for each I/O. Maybe that
already exists; I'm not keeping up with io_uring development and I can't
seem to find any documentation on what things like io_provide_buffers()
actually do.
That's precisely what io-uring fixed buffers do :)
--
Cheers,
David / dhildenb