On Wed, Sep 12, 2018 at 05:39:14PM +0100, Will Deacon wrote: > Hi Sean, > > Thanks for looking at the patch. > > On Wed, Sep 12, 2018 at 08:09:39AM -0700, Sean Christopherson wrote: > > On Wed, Sep 12, 2018 at 11:26:13AM +0100, Will Deacon wrote: > > > The current ioremap() code uses a phys_addr variable at each level of > > > page table, which is confusingly offset by subtracting the base virtual > > > address being mapped so that adding the current virtual address back on > > > when iterating through the page table entries gives back the corresponding > > > physical address. > > > > > > This is fairly confusing and results in all users of phys_addr having to > > > add the current virtual address back on. Instead, this patch just updates > > > phys_addr when iterating over the page table entries, ensuring that it's > > > always up-to-date and doesn't require explicit offsetting. > > > > > > Cc: Chintan Pandya <cpandya@xxxxxxxxxxxxxx> > > > Cc: Toshi Kani <toshi.kani@xxxxxxx> > > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > > Cc: Michal Hocko <mhocko@xxxxxxxx> > > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > Signed-off-by: Will Deacon <will.deacon@xxxxxxx> > > > --- > > > lib/ioremap.c | 28 ++++++++++++---------------- > > > 1 file changed, 12 insertions(+), 16 deletions(-) > > > > > > diff --git a/lib/ioremap.c b/lib/ioremap.c > > > index 6c72764af19c..fc834a59c90c 100644 > > > --- a/lib/ioremap.c > > > +++ b/lib/ioremap.c > > > @@ -101,19 +101,18 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr, > > > pmd_t *pmd; > > > unsigned long next; > > > > > > - phys_addr -= addr; > > > pmd = pmd_alloc(&init_mm, pud, addr); > > > if (!pmd) > > > return -ENOMEM; > > > do { > > > next = pmd_addr_end(addr, end); > > > > > > - if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr + addr, prot)) > > > + if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) > > > continue; > > > > > > - if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot)) > > > + if (ioremap_pte_range(pmd, addr, next, phys_addr, prot)) > > > return -ENOMEM; > > > - } while (pmd++, addr = next, addr != end); > > > + } while (pmd++, addr = next, phys_addr += PMD_SIZE, addr != end); > > > > I think bumping phys_addr by PXX_SIZE is wrong if phys_addr and addr > > start unaligned with respect to PXX_SIZE. The addresses must be > > PAGE_ALIGNED, which lets ioremap_pte_range() do a simple calculation, > > but that doesn't hold true for the upper levels, i.e. phys_addr needs > > to be adjusted using an algorithm similar to pxx_addr_end(). > > > > Using a 2mb page as an example (lower 32 bits only): > > > > pxx_size = 0x00020000 > > pxx_mask = 0xfffe0000 > > addr = 0x1000 > > end = 0x00040000 > > phys_addr = 0x1000 > > > > Loop 1: > > addr = 0x1000 > > phys = 0x1000 > > > > Loop 2: > > addr = 0x20000 > > phys = 0x21000 > > Yes, I think you're completely right, however I also don't think this > can happen with the current code (and I've failed to trigger it in my > testing). The virtual addresses allocated for VM_IOREMAP allocations > are aligned to the order of the allocation, which means that the virtual > address at the start of the mapping is aligned such that when we hit the > end of a pXd, we know we've mapped the previous PXD_SIZE bytes. > > Having said that, this is clearly a change from the current code and I > haven't audited architectures other than arm64 (where IOREMAP_MAX_ORDER > corresponds to the maximum size of our huge mappings), so it would be > much better not to introduce this funny behaviour in a patch that aims > to reduce confusion in the first place! > > Fixing this using the pxx_addr_end() macros is a bit strange, since we > don't have a physical end variable (nor do we need one), so perhaps > something like changing the while condition to be: > > do { > ... > } while (pmd++, phys_addr += (next - addr), addr = next, addr != end); > > would do the trick. What do you reckon? LGTM. I like that there isn't a separate calculation for phys_addr's offset.