Le 29/05/2024 à 10:49, Oscar Salvador a écrit : > [Vous ne recevez pas souvent de courriers de osalvador@xxxxxxxx. D?couvrez pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification ] > > On Mon, May 27, 2024 at 03:30:11PM +0200, Christophe Leroy wrote: >> e500 supports many page sizes among which the following size are >> implemented in the kernel at the time being: 4M, 16M, 64M, 256M, 1G. >> >> On e500, TLB miss for hugepages is exclusively handled by SW even >> on e6500 which has HW assistance for 4k pages, so there are no >> constraints like on the 8xx. >> >> On e500/32, all are at PGD/PMD level and can be handled as >> cont-PMD. >> >> On e500/64, smaller ones are on PMD while bigger ones are on PUD. >> Again, they can easily be handled as cont-PMD and cont-PUD instead >> of hugepd. >> >> Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxxxxxx> > > ... > >> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h >> index 90d6a0943b35..f7421d1a1693 100644 >> --- a/arch/powerpc/include/asm/nohash/pgtable.h >> +++ b/arch/powerpc/include/asm/nohash/pgtable.h >> @@ -52,11 +52,36 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, p >> { >> pte_basic_t old = pte_val(*p); >> pte_basic_t new = (old & ~(pte_basic_t)clr) | set; >> + unsigned long sz; >> + unsigned long pdsize; >> + int i; >> >> if (new == old) >> return old; >> >> - *p = __pte(new); >> +#ifdef CONFIG_PPC_E500 >> + if (huge) >> + sz = 1UL << (((old & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 20); >> + else > > I think this will not compile when CONFIG_PPC_85xx && !CONFIG_PTE_64BIT. Yes, I got a feedback on this from the robots. > > You have declared _PAGE_HSIZE_MSK and _PAGE_HSIZE_SHIFT in > arch/powerpc/include/asm/nohash/hugetlb-e500.h. > > But hugetlb-e500.h is only included if CONFIG_PPC_85xx && CONFIG_PTE_64BIT > (see arch/powerpc/include/asm/nohash/32/pgtable.h). > > > >> +#endif >> + sz = PAGE_SIZE; >> + >> + if (!huge || sz < PMD_SIZE) >> + pdsize = PAGE_SIZE; >> + else if (sz < PUD_SIZE) >> + pdsize = PMD_SIZE; >> + else if (sz < P4D_SIZE) >> + pdsize = PUD_SIZE; >> + else if (sz < PGDIR_SIZE) >> + pdsize = P4D_SIZE; >> + else >> + pdsize = PGDIR_SIZE; >> + >> + for (i = 0; i < sz / pdsize; i++, p++) { >> + *p = __pte(new); >> + if (new) >> + new += (unsigned long long)(pdsize / PAGE_SIZE) << PTE_RPN_SHIFT; > > I guess 'new' can be 0 if pte_update() is called on behave of clearing the pte? It is exactly that, and without that verification I had pmd_bad() returning bad pmds after freeing page tables. > >> +static inline unsigned long pmd_leaf_size(pmd_t pmd) >> +{ >> + return 1UL << (((pmd_val(pmd) & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 20); > > Can we have the '20' somewhere defined with a comment on top explaining > what is so it is not a magic number? > Otherwise people might come look at this and wonder why 20. Yes I now have : +#define _PAGE_HSIZE_MSK (_PAGE_U0 | _PAGE_U1 | _PAGE_U2 | _PAGE_U3) +#define _PAGE_HSIZE_SHIFT 14 +#define _PAGE_HSIZE_SHIFT_OFFSET 20 and have added a helper to avoid doing the calculation at several places: +static inline unsigned long pte_huge_size(pte_t pte) +{ + pte_basic_t val = pte_val(pte); + + return 1UL << (((val & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + _PAGE_HSIZE_SHIFT_OFFSET); +} > >> --- a/arch/powerpc/mm/pgtable.c >> +++ b/arch/powerpc/mm/pgtable.c >> @@ -331,6 +331,37 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, >> __set_huge_pte_at(pmdp, ptep, pte_val(pte)); >> } >> } >> +#elif defined(CONFIG_PPC_E500) >> +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, >> + pte_t pte, unsigned long sz) >> +{ >> + unsigned long pdsize; >> + int i; >> + >> + pte = set_pte_filter(pte, addr); >> + >> + /* >> + * Make sure hardware valid bit is not set. We don't do >> + * tlb flush for this update. >> + */ >> + VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); >> + >> + if (sz < PMD_SIZE) >> + pdsize = PAGE_SIZE; >> + else if (sz < PUD_SIZE) >> + pdsize = PMD_SIZE; >> + else if (sz < P4D_SIZE) >> + pdsize = PUD_SIZE; >> + else if (sz < PGDIR_SIZE) >> + pdsize = P4D_SIZE; >> + else >> + pdsize = PGDIR_SIZE; >> + >> + for (i = 0; i < sz / pdsize; i++, ptep++, addr += pdsize) { >> + __set_pte_at(mm, addr, ptep, pte, 0); >> + pte = __pte(pte_val(pte) + ((unsigned long long)pdsize / PAGE_SIZE << PFN_PTE_SHIFT)); > > You can use pte_advance_pfn() here? Just give have > > nr = (unsigned long long)pdsize / PAGE_SIZE << PFN_PTE_SHIFT) > pte_advance_pfn(pte, nr) That's what I did before but it didn't work. The problem is that pte_advance_pfn() takes a long not a long long: static inline pte_t pte_advance_pfn(pte_t pte, unsigned long nr) { return __pte(pte_val(pte) + (nr << PFN_PTE_SHIFT)); } And when I called it with nr = PMD_SIZE / PAGE_SIZE = 2M / 4k = 512, as we have PFN_PTE_SHIFT = 24, I got 512 << 24 = 0 > > Which 'sz's can we have here? You mentioned that e500 support: > > 4M, 16M, 64M, 256M, 1G. > > which of these ones can be huge? All are huge. > > > -- > Oscar Salvador > SUSE Labs