On 01/23/2020 01:25 PM, Xuefeng Wang wrote: > On KunPeng920 board. When changing permission of a large range region, > pmdp_invalidate() takes about 65% in profile (with hugepages) in JIT tool. > Kernel will flush tlb twice: first flush happens in pmdp_invalidate, second > flush happens at the end of change_protect_range(). The first pmdp_invalidate > is not necessary if the hardware support atomic pmdp changing. The atomic > changing pimd to zero can prevent the hardware from update asynchronous. > So reconstruct it and remove the first pmdp_invalidate. And the second tlb > flush can make sure the new tlb entry valid. > > This patch series add a pmdp_modify_prot transaction abstraction firstly. > Then add pmdp_modify_prot_start() in arm64, which uses pmdp_huge_get_and_clear() > to atomically fetch the pmd and zero the entry. There is a comment section in change_huge_pmd() which details how clearing the PMD entry there (in prot_numa case) can potentially race with another concurrent madvise(MADV_DONTNEED, ..) call. Here is the comment block for reference. /* * In case prot_numa, we are under down_read(mmap_sem). It's critical * to not clear pmd intermittently to avoid race with MADV_DONTNEED * which is also under down_read(mmap_sem): * * CPU0: CPU1: * change_huge_pmd(prot_numa=1) * pmdp_huge_get_and_clear_notify() * madvise_dontneed() * zap_pmd_range() * pmd_trans_huge(*pmd) == 0 (without ptl) * // skip the pmd * set_pmd_at(); * // pmd is re-established * * The race makes MADV_DONTNEED miss the huge pmd and don't clear it * which may break userspace. * * pmdp_invalidate() is required to make sure we don't miss * dirty/young flags set by hardware. */ By defining the new override with pmdp_huge_get_and_clear(), are not we now exposed to above race condition ? - Anshuman