Hi Matthew, On 3/17/2023 11:44 AM, Matthew Wilcox wrote: > On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote: >> >> >> On 3/17/2023 1:52 AM, Matthew Wilcox wrote: >>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote: >>>> On 16/03/2023 16:23, Yin, Fengwei wrote: >>>>>> I think you are changing behavior here - is this intentional? Previously this >>>>>> would be evaluated per page, now its evaluated once for the whole range. The >>>>>> intention below is that directly faulted pages are mapped young and prefaulted >>>>>> pages are mapped old. But now a whole range will be mapped the same. >>>>> >>>>> Yes. You are right here. >>>>> >>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we >>>>> can avoid to handle vmf->address == addr specially. It's OK to >>>>> drop prefault and change the logic here a little bit to: >>>>> if (arch_wants_old_prefaulted_pte()) >>>>> entry = pte_mkold(entry); >>>>> else >>>>> entry = pte_sw_mkyong(entry); >>>>> >>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr >>>>> because HW will set the ACCESS bit in page table entry. >>>>> >>>>> Add Will Deacon in case I missed something here. Thanks. >>>> >>>> I'll defer to Will's response, but not all arm HW supports HW access flag >>>> management. In that case it's done by SW, so I would imagine that by setting >>>> this to old initially, we will get a second fault to set the access bit, which >>>> will slow things down. I wonder if you will need to split this into (up to) 3 >>>> calls to set_ptes()? >>> >>> I don't think we should do that. The limited information I have from >>> various microarchitectures is that the PTEs must differ only in their >>> PFN bits in order to use larger TLB entries. That includes the Accessed >>> bit (or equivalent). So we should mkyoung all the PTEs in the same >>> folio, at least initially. >>> >>> That said, we should still do this conditionally. We'll prefault some >>> other folios too. So I think this should be: >>> >>> bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address); >>> >> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access >> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old". >> If we change prefault like above, the PTEs is set as "yong" which loose benefit >> on ARM64 with hardware access flag. >> >> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old" >> and let hardware to update it to "yong"? > > Because we're tracking the entire folio as a single entity. So we're > better off avoiding the extra pagefaults to update the accessed bit, > which won't actually give us any information (vmscan needs to know "were > any of the accessed bits set", not "how many of them were set"). > > Anyway, hopefully Ryan can test this and let us know if it fixes the > regression he sees. Thanks a lot to Ryan for helping to test the debug patch I made. Ryan confirmed that the following change could fix the kernel build regression: diff --git a/mm/filemap.c b/mm/filemap.c index db86e459dde6..343d6ff36b2c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3557,7 +3557,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, ret |= filemap_map_folio_range(vmf, folio, xas.xa_index - folio->index, addr, nr_pages); - xas.xa_index += nr_pages; + xas.xa_index += folio_test_large(folio) ? nr_pages : 0; folio_unlock(folio); folio_put(folio); I will make upstream-able change as "xas.xa_index += nr_pages - 1;" Ryan and I also identify some other changes needed. I am not sure how to integrate those changes to this series. Maybe an add-on patch after this series? Thanks. Regards Yin, Fengwei