Re: Direct I/O performance problems with 1GB pages

David Hildenbrand <david@xxxxxxxxxx> · Mon, 27 Jan 2025 17:20:25 +0100

On 27.01.25 17:09, David Hildenbrand wrote:

If the workload doing a lot of single-page try_grab_folio_fast(), could it
do so on a larger area (multiple pages at once -> single refcount update)?

Not really.  This is memory that's being used as the buffer cache, so
every thread in your database is hammering on it and pulling in exactly
the data that it needs for the SQL query that it's processing.

Ouch.

Maybe there is a link to the report you could share, thanks.

Andres shared some gists, but I don't want to send those to a
mailing list without permission.  Here's the kernel part of the
perf report:

      14.04%  postgres         [kernel.kallsyms]          [k] try_grab_folio_fast
              |
               --14.04%--try_grab_folio_fast
                         gup_fast_fallback
                         |
                          --13.85%--iov_iter_extract_pages
                                    bio_iov_iter_get_pages
                                    iomap_dio_bio_iter
                                    __iomap_dio_rw
                                    iomap_dio_rw
                                    xfs_file_dio_read
                                    xfs_file_read_iter
                                    __io_read
                                    io_read
                                    io_issue_sqe
                                    io_submit_sqes
                                    __do_sys_io_uring_enter
                                    do_syscall_64

BTW, two things that come to mind:

(1) We always fallback to GUP-fast, I wonder why. GUP-fast would go via 
try_grab_folio_fast().

(2) During GUP slow, we must take the PT lock of the PUD table. So the 
folio refcount/pincount/whatever is actually sync'ed by the ... PT lock 
here?

See assert_spin_locked(pud_lockptr(mm, pudp)); in follow_huge_pud().

Note that that PUD table lock is likely a per-MM lock ... and yes, it 
indeed is. We don't have split PUD locks.

--
Cheers,

David / dhildenb