On 28/02/2019 18:55, Paul Burton wrote: > Hi Steven, > > On Thu, Feb 28, 2019 at 12:11:24PM +0000, Steven Price wrote: >> On 28/02/2019 02:15, Paul Burton wrote: >>> On Wed, Feb 27, 2019 at 05:05:45PM +0000, Steven Price wrote: >>>> For mips, we don't support large pages on 32 bit so add stubs returning 0. >>> >>> So far so good :) >>> >>>> For 64 bit look for _PAGE_HUGE flag being set. This means exposing the >>>> flag when !CONFIG_MIPS_HUGE_TLB_SUPPORT. >>> >>> Here I have to ask why? We could just return 0 like the mips32 case when >>> CONFIG_MIPS_HUGE_TLB_SUPPORT=n, let the compiler optimize the whole >>> thing out and avoid redundant work at runtime. >>> >>> This could be unified too in asm/pgtable.h - checking for >>> CONFIG_MIPS_HUGE_TLB_SUPPORT should be sufficient to cover the mips32 >>> case along with the subset of mips64 configurations without huge pages. >> >> The intention here is to define a new set of macros/functions which will >> always tell us whether we're at the leaf of a page table walk, whether >> or not huge pages are compiled into the kernel. Basically this allows >> the page walking code to be used on page tables other than user space, >> for instance the kernel page tables (which e.g. might use a large >> mapping for linear memory even if huge pages are not compiled in) or >> page tables from firmware (e.g. EFI on arm64). >> >> I'm not familiar enough with mips to know how it handles things like the >> linear map so I don't know how relevant that is, but I'm trying to >> introduce a new set of functions which differ from the existing >> p?d_huge() macros by not depending on whether these mappings could exist >> for a user space VMA (i.e. not depending on HUGETLB support and existing >> for all levels that architecturally they can occur at). > > Thanks for the explanation - the background helps. > > Right now for MIPS, with one exception, there'll be no difference > between a page being huge or large. So for the vast majority of kernels > with CONFIG_MIPS_HUGE_TLB_SUPPORT=n we should just return 0. > > The one exception I mentioned is old SGI IP27 support, which allows the > kernel to be mapped through the TLB & does that using 2x 16MB pages when > CONFIG_MAPPED_KERNEL=y. However even there your patch as-is won't pick > up on that for 2 reasons: > > 1) The pages in question don't appear to actually be recorded in the > page tables - they're just written straight into the TLB as wired > entries (ie. entries that will never be evicted). > > 2) Even if they were in the page tables the _PAGE_HUGE bit isn't set. > > Since those pages aren't recorded in the page tables anyway we'd either > need to: > > a) Add them to the page tables, and set the _PAGE_HUGE bit. > > b) Ignore them if the code you're working on won't be operating on the > memory mapping the kernel. > > For other platforms the kernel is run from unmapped memory, and for all > cases including IP27 the kernel will use unmapped memory to access > lowmem or peripherals when possible. That is, MIPS has virtual address > regions ((c)kseg[01] or xkphys) which are architecturally defined as > linear maps to physical memory & so VA->PA translation doesn't use the > TLB at all. > > So my thought would be that for almost everything we could just do: > > #define pmd_large(pmd) pmd_huge(pmd) > #define pud_large(pmd) pud_huge(pmd) > > And whether we need to do anything about IP27 depends on whether a) or > b) is chosen above. > > Or alternatively you could do something like: > > #ifdef _PAGE_HUGE > > static inline int pmd_large(pmd_t pmd) > { > return (pmd_val(pmd) & _PAGE_HUGE) != 0; > } > > static inline int pud_large(pud_t pud) > { > return (pud_val(pud) & _PAGE_HUGE) != 0; > } > > #else > # define pmd_large(pmd) 0 > # define pud_large(pud) 0 > #endif > > That would cover everything except for the IP27, but would make it pick > up the IP27 kernel pages automatically if someone later defines > _PAGE_HUGE for IP27 CONFIG_MAPPED_KERNEL=y & makes use of it for those > pages. Thanks for the detailed explanation. I think my preference is your change above (#ifdef _PAGE_HUGE) because I'm trying to stop people thinking p?d_large==p?d_huge. MIPS is a little different from other architectures in that the hardware doesn't walk the page tables, so there isn't a definitive answer as to whether there is a 'huge' bit in the tables or not - it actually does depend on the kernel configuration. For the IP27 case I think the current situation is probably fine - the intention is to walk the page tables, so even though the TLBs (and therefore the actual translations) might not match, at least p?d_large will accurately tell when the leaf of the page table tree has been reached. And as you say, using _PAGE_HUGE as the #ifdef means that should support be added the code should automatically make use of it. Thanks for your help, Steve