Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

Nicolas Boichat <drinkcat@xxxxxxxxxxxx> · Thu, 22 Nov 2018 09:05:30 +0800

On Thu, Nov 22, 2018 at 6:27 AM Robin Murphy <robin.murphy@xxxxxxx> wrote:
>
> On 2018-11-21 9:38 pm, Matthew Wilcox wrote:
> > On Wed, Nov 21, 2018 at 06:20:02PM +0000, Christopher Lameter wrote:
> >> On Sun, 11 Nov 2018, Nicolas Boichat wrote:
> >>
> >>> This is a follow-up to the discussion in [1], to make sure that the page
> >>> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
> >>> physical address space.
> >>
> >> Page tables? This means you need a page frame? Why go through the slab
> >> allocators?
> >
> > Because this particular architecture has sub-page-size PMD page tables.
> > We desperately need to hoist page table allocation out of the architectures;
> > there're a bunch of different implementations and they're mostly bad,
> > one way or another.
>
> These are IOMMU page tables, rather than CPU ones, so we're already well
> outside arch code - indeed the original motivation of io-pgtable was to
> be entirely independent of the p*d types and arch-specific MM code (this
> Armv7 short-descriptor format is already "non-native" when used by
> drivers in an arm64 kernel).
>
> There are various efficiency reasons for using regular kernel memory
> instead of coherent DMA allocations - for the most part it works well,
> we just have the odd corner case like this one where the 32-bit format
> gets used on 64-bit systems such that the tables themselves still need
> to be allocated below 4GB (although the final output address can point
> at higher memory by virtue of the IOMMU in question not implementing
> permissions and repurposing some of those PTE fields as extra address bits).
>
> TBH, if this DMA32 stuff is going to be contentious we could possibly
> just rip out the offending kmem_cache - it seemed like good practice for
> the use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied
> upon to give the same 1KB alignment and chance of succeeding as the
> equivalent kmem_cache_alloc(), then we could quite easily make do with
> that instead.

Yes, but if we want to use kzalloc, we'll need to create
kmalloc_caches for DMA32, which seems wasteful as there are no other
users (see my comment here:
https://patchwork.kernel.org/patch/10677525/#22332697).

Thanks,

> Thanks,
> Robin.
>
> > For each level of page table we generally have three cases:
> >
> > 1. single page
> > 2. sub-page, naturally aligned
> > 3. multiple pages, naturally aligned
> >
> > for 1 and 3, the page allocator will do just fine.
> > for 2, we should have a per-MM page_frag allocator.  s390 already has
> > something like this, although it's more complicated.  ppc also has
> > something a little more complex for the cases when it's configured with
> > a 64k page size but wants to use a 4k page table entry.
> >
> > I'd like x86 to be able to simply do:
> >
> > #define pte_alloc_one(mm, addr)       page_alloc_table(mm, addr, 0)
> > #define pmd_alloc_one(mm, addr)       page_alloc_table(mm, addr, 0)
> > #define pud_alloc_one(mm, addr)       page_alloc_table(mm, addr, 0)
> > #define p4d_alloc_one(mm, addr)       page_alloc_table(mm, addr, 0)
> >
> > An architecture with 4k page size and needing a 16k PMD would do:
> >
> > #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2)
> >
> > while an architecture with a 64k page size needing a 4k PTE would do:
> >
> > #define ARCH_PAGE_TABLE_FRAG
> > #define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096)
> >
> > I haven't had time to work on this, but perhaps someone with a problem
> > that needs fixing would like to, instead of burying yet another awful
> > implementation away in arch/ somewhere.
> >