Re: Direct I/O performance problems with 1GB pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28.01.25 06:56, Christoph Hellwig wrote:
I read through this and unfortunately have nothing useful to contribute
to the actual lock sharding.  Just two semi-related bits:

On Sun, Jan 26, 2025 at 12:46:45AM +0000, Matthew Wilcox wrote:
Postgres are experimenting with doing direct I/O to 1GB hugetlb pages.
Andres has gathered some performance data showing significantly worse
performance with 1GB pages compared to 2MB pages.  I sent a patch
recently which improves matters [1], but problems remain.

The primary problem we've identified is contention of folio->_refcount
with a strong secondary contention on folio->_pincount.  This is coming
from the call chain:

iov_iter_extract_pages ->
gup_fast_fallback ->
try_grab_folio_fast

Eww, gup_fast_fallback sent me down the wrong path, as it suggests that
is is the fallback slow path, but it's not.  It got renamed from
internal_get_user_pages_fast in commit 23babe1934d7 ("mm/gup:
consistently name GUP-fast functions").  While old name wasn't all
that great the new one including fallback is just horrible.  Can we
please fix this up?

Yes, I did that renaming as part of that series after the name was suggested during review. I got confused myself reading this report.

	internal_get_user_pages_fast() -> gup_fast_fallback()

Was certainly an improvement. Naming is hard, we want to express "try fast, but fallback to slow if it fails".

Maybe "gup_fast_with_fallback" might be clearer, not sure.


The other thing is that the whole GUP machinery takes a reference
per page fragment it touches and not just per folio fragment.

Right, that's what calling code expects: hands out pages, one ref per page.

I'm not sure how fair access to the atomics is, but I suspect the
multiple calls to increment/decrement them per operation probably
don't make the lock contention any better.

The "advanced" logic in __get_user_pages() makes sure that if you GUP multiple pages covered by a single PMD/PUD, that we will adjust the refcount only twice (once deep down in the page table walker, then in __get_user_pages()).

There is certainly a lot of room for improvement in these page table walkers ...

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux