On 27.01.25 17:09, David Hildenbrand wrote:
If the workload doing a lot of single-page try_grab_folio_fast(), could it
do so on a larger area (multiple pages at once -> single refcount update)?
Not really. This is memory that's being used as the buffer cache, so
every thread in your database is hammering on it and pulling in exactly
the data that it needs for the SQL query that it's processing.
Ouch.
Maybe there is a link to the report you could share, thanks.
Andres shared some gists, but I don't want to send those to a
mailing list without permission. Here's the kernel part of the
perf report:
14.04% postgres [kernel.kallsyms] [k] try_grab_folio_fast
|
--14.04%--try_grab_folio_fast
gup_fast_fallback
|
--13.85%--iov_iter_extract_pages
bio_iov_iter_get_pages
iomap_dio_bio_iter
__iomap_dio_rw
iomap_dio_rw
xfs_file_dio_read
xfs_file_read_iter
__io_read
io_read
io_issue_sqe
io_submit_sqes
__do_sys_io_uring_enter
do_syscall_64
BTW, two things that come to mind:
(1) We always fallback to GUP-fast, I wonder why. GUP-fast would go via
try_grab_folio_fast().
(2) During GUP slow, we must take the PT lock of the PUD table. So the
folio refcount/pincount/whatever is actually sync'ed by the ... PT lock
here?
See assert_spin_locked(pud_lockptr(mm, pudp)); in follow_huge_pud().
Note that that PUD table lock is likely a per-MM lock ... and yes, it
indeed is. We don't have split PUD locks.
--
Cheers,
David / dhildenb