Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers

Petr Tesařík <petr@xxxxxxxxxxx> · Wed, 17 May 2023 11:58:21 +0200

On Wed, 17 May 2023 10:41:19 +0100
Catalin Marinas <catalin.marinas@xxxxxxx> wrote:

> On Wed, May 17, 2023 at 08:56:53AM +0200, Christoph Hellwig wrote:
> > Just thinking out loud:
> > 
> >  - what if we always way overallocate the swiotlb buffer
> >  - and then mark the second half / two thirds / <pull some number out
> >    of the thin air> slots as used, and make that region available
> >    through a special CMA mechanism as ZONE_MOVABLE (but not allowing
> >    other CMA allocations to dip into it).
> > 
> > This allows us to have a single slot management for the entire
> > area, but allow reclaiming from it.  We'd probably also need to make
> > this CMA variant irq safe.  
> 
> I think this could work. It doesn't need to be ZONE_MOVABLE (and we
> actually need this buffer in ZONE_DMA). But we can introduce a new
> migrate type, MIGRATE_SWIOTLB, and movable page allocations can use this
> range. The CMA allocations go to free_list[MIGRATE_CMA], so they won't
> overlap.
> 
> One of the downsides is that migrating movable pages still needs a
> sleep-able context.

Pages can be migrated by a separate worker thread when the number of
free slots reaches a low watermark.

> Another potential confusion is is_swiotlb_buffer() for pages in this
> range allocated through the normal page allocator. We may need to check
> the slots as well rather than just the buffer boundaries.

Ah, yes, I forgot about this part; thanks for the reminder.

Indeed, movable pages can be used for the page cache, and drivers do
DMA to/from buffers in the page cache.

Let me recap:

- Allocated chunks must still be tracked with this approach.
- The pool of available slots cannot be grown from interrupt context.

So, what exactly is the advantage compared to allocating additional
swiotlb chunks from CMA?

> (we are actually looking at a MIGRATE_METADATA type for the arm64 memory
> tagging extension which uses a 3% carveout of the RAM for storing the
> tags and people want that reused somehow; we have some WIP patches but
> we'll post them later this summer)
> 
> > This could still be combined with more aggressive use of per-device
> > swiotlb area, which is probably a good idea based on some hints.
> > E.g. device could hint an amount of inflight DMA to the DMA layer,
> > and if there are addressing limitations and the amout is large enough
> > that could cause the allocation of a per-device swiotlb area.  
> 
> If we go for one large-ish per-device buffer for specific cases, maybe
> something similar to the rmem_swiotlb_setup() but which can be
> dynamically allocated at run-time and may live alongside the default
> swiotlb. The advantage is that it uses a similar slot tracking to the
> default swiotlb, no need to invent another. This per-device buffer could
> also be allocated from the MIGRATE_SWIOTLB range if we make it large
> enough at boot. It would be seen just a local accelerator for devices
> that use bouncing frequently or from irq context.

A per-device pool could also be used for small buffers. IIRC somebody
was interested in that.

Petr T