On Tue, Nov 23, 2021 at 5:58 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > On Tue, 23 Nov 2021 at 10:13, Huangzhaoyang <huangzhaoyang@xxxxxxxxx> wrote: > > > > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx> > > > > Since there is no PTE_CONT when rodata_full in ARM64, introducing a > > hook function to apply PTE_CONT on the proper page blocks. > > > > Given the discussion around your previous patch, I would expect a > meticulous explanation here why it is guaranteed to be safe to > manipulate the PTE_CONT attribute like this, and how the proposed > logic is correct for all supported page sizes. > > Without using an intermediate invalid mapping for the entire range, > this is never going to work reliably (this is the break-before-make > requirement). And given that marking the entire block invalid will > create intermediate states that are not permitted (a valid PTE_CONT > mapping and an invalid ~PTE_CONT mapping covering the same VA), the > only way to apply changes like these is to temporarily switch all CPUs > to a different translation via TTBR1. And this is not going to happen. As there is no safe way to modify PTE_CONT on a live mapping, please forget all previous patches except current one. > > Also, you never replied to my question regarding the use case and the > performance gain. In our android based system, the multimedia related cases suffers from small pte granularity mostly which use high order page blocks quite a lot. The performance gap even be visible. > > In summary, NAK to this patch or any of the previous ones regarding > PTE_CONT. If you do insist on pursuing this further, please provide an > elaborate and rock solid explanation why your approach is 100% valid > and correct (for all page sizes). And make sure you include an > explanation how your changes comply with the architectural > break-before-make requirements around PTE_CONT attributes. IMHO, It is safe to modify the page block's pte undering *arch_alloc/free_pages* as there is no one else aware of it. Furthermore, I do think tlbflush and barriers are needed for synchronization. With regards to page sizes issue, I think replacing the hard code const value to CONT_PTE_XXX could wrap the difference and make it correct. - if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) { - order -= 4; + if ((order >= CONT_PTE_SHIFT) && (addr & ~CONT_PTE_MASK) == 0) { + order -= CONT_PTE_SHIFT; do { cont_pte_low_bound = addr & CONT_PTE_MASK; __change_memory_common(cont_pte_low_bound, (~CONT_PTE_MASK + 1), __pgprot(PTE_CONT), __pgprot(0)); addr = (u64)page_address(page); - page += 4; + page += CONT_PTES; order--; }while (order >= 0); > > > > > --- > > arch/arm64/include/asm/page.h | 5 +++++ > > arch/arm64/mm/pageattr.c | 45 +++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 50 insertions(+) > > > > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h > > index f98c91b..53cdd09 100644 > > --- a/arch/arm64/include/asm/page.h > > +++ b/arch/arm64/include/asm/page.h > > @@ -46,6 +46,11 @@ struct page *alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, > > > > #include <asm/memory.h> > > > > +#define HAVE_ARCH_ALLOC_PAGE > > +#define HAVE_ARCH_FREE_PAGE > > + > > +extern void arch_alloc_page(struct page *page, int order); > > +extern void arch_free_page(struct page *page, int order); > > #endif /* !__ASSEMBLY__ */ > > > > #define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED) > > diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c > > index a3bacd7..815a06d 100644 > > --- a/arch/arm64/mm/pageattr.c > > +++ b/arch/arm64/mm/pageattr.c > > @@ -239,3 +239,48 @@ bool kernel_page_present(struct page *page) > > ptep = pte_offset_kernel(pmdp, addr); > > return pte_valid(READ_ONCE(*ptep)); > > } > > + > > +void arch_alloc_page(struct page *page, int order) > > +{ > > + unsigned long addr; > > + unsigned long cont_pte_low_bound; > > + > > + if (!rodata_full) > > + return; > > + > > + addr = (u64)page_address(page); > > + if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) { > > + order -= 4; > > + do { > > + cont_pte_low_bound = addr & CONT_PTE_MASK; > > + __change_memory_common(cont_pte_low_bound, > > + (~CONT_PTE_MASK + 1), __pgprot(PTE_CONT), __pgprot(0)); > > + addr = (u64)page_address(page); > > + page += 4; > > + order--; > > + }while (order >= 0); > > + } > > +} > > + > > +void arch_free_page(struct page *page, int order) > > +{ > > + unsigned long addr; > > + unsigned long cont_pte_low_bound; > > + > > + if (!rodata_full) > > + return; > > + > > + addr = (u64)page_address(page); > > + if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) { > > + order -= 4; > > + do { > > + cont_pte_low_bound = addr & CONT_PTE_MASK; > > + __change_memory_common(cont_pte_low_bound, > > + (~CONT_PTE_MASK + 1), __pgprot(0), __pgprot(PTE_CONT)); > > + addr = (u64)page_address(page); > > + page += 4; > > + order--; > > + }while (order >= 0); > > + } > > +} > > + > > -- > > 1.9.1 > >