On 28.01.25 06:56, Christoph Hellwig wrote:
I read through this and unfortunately have nothing useful to contribute
to the actual lock sharding. Just two semi-related bits:
On Sun, Jan 26, 2025 at 12:46:45AM +0000, Matthew Wilcox wrote:
Postgres are experimenting with doing direct I/O to 1GB hugetlb pages.
Andres has gathered some performance data showing significantly worse
performance with 1GB pages compared to 2MB pages. I sent a patch
recently which improves matters [1], but problems remain.
The primary problem we've identified is contention of folio->_refcount
with a strong secondary contention on folio->_pincount. This is coming
from the call chain:
iov_iter_extract_pages ->
gup_fast_fallback ->
try_grab_folio_fast
Eww, gup_fast_fallback sent me down the wrong path, as it suggests that
is is the fallback slow path, but it's not. It got renamed from
internal_get_user_pages_fast in commit 23babe1934d7 ("mm/gup:
consistently name GUP-fast functions"). While old name wasn't all
that great the new one including fallback is just horrible. Can we
please fix this up?
Yes, I did that renaming as part of that series after the name was
suggested during review. I got confused myself reading this report.
internal_get_user_pages_fast() -> gup_fast_fallback()
Was certainly an improvement. Naming is hard, we want to express "try
fast, but fallback to slow if it fails".
Maybe "gup_fast_with_fallback" might be clearer, not sure.
The other thing is that the whole GUP machinery takes a reference
per page fragment it touches and not just per folio fragment.
Right, that's what calling code expects: hands out pages, one ref per page.
I'm not sure how fair access to the atomics is, but I suspect the
multiple calls to increment/decrement them per operation probably
don't make the lock contention any better.
The "advanced" logic in __get_user_pages() makes sure that if you GUP
multiple pages covered by a single PMD/PUD, that we will adjust the
refcount only twice (once deep down in the page table walker, then in
__get_user_pages()).
There is certainly a lot of room for improvement in these page table
walkers ...
--
Cheers,
David / dhildenb