On 04.07.24 16:07, Kirill A. Shutemov wrote:
On Thu, Jul 04, 2024 at 02:39:49PM +0100, Ryan Roberts wrote:
On 04/07/2024 12:41, Kirill A. Shutemov wrote:
On Wed, Jul 03, 2024 at 06:37:48PM +0100, Ryan Roberts wrote:
Hi Kirill, Hugh, Mel,
We recently had a problem reported at [1] that due to aarch64 arch requiring
that atomic RMW instructions raise a read fault, followed by a write fault, this
causes a huge zero page to be faulted in during the read fault, then the write
fault shatters the huge zero page, installing small zero pages for every PTE in
the PMD region, except the faulting address which gets a writable private page.
A number of ways were discussed to solve that problem. But it got me wondering
why we have this behaviour in general for huge zero page? This seems like odd
behaviour to me. Surely it would be less effort and more aligned with the app's
expectations to notice the huge zero page in the PMD, remove it, and install a
THP, as would have been done if pmd_none() was true? Or if there is a reason to
shatter on write, why not do away with the huge zero page and save some memory,
and just install a PMD's worth of small zero pages on fault?
Perhaps replacing the huge zero page with a huge THP on write fault would have
been a better behavior at the time, but perhaps changing that behaviour now
risks a memory bloat regression in some workloads?
Yeah, I agree that WP fault on zero page page should give THP. I think
treating zero page as none PMD on write page fault should be safe and
reasonable.
So you're not concerned about potential for memory bloat regressions in apps
that are deployed today? I'm a bit nervous to make the change without a bunch of
testing...
No, I am not concern. It is silly to expect different result depending on
what comes first read or write fault to the page. It should be consistent.
On related note, I think we need to drop __split_huge_zero_page_pmd(). We
can just unmap it on split and let caller populate it as needed as we do
for file mappings.
We'll likely have to take care of UFFD-WP being set on the PMD.
--
Cheers,
David / dhildenb