On Wed, 17 May 2023 10:41:19 +0100 Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > On Wed, May 17, 2023 at 08:56:53AM +0200, Christoph Hellwig wrote: > > Just thinking out loud: > > > > - what if we always way overallocate the swiotlb buffer > > - and then mark the second half / two thirds / <pull some number out > > of the thin air> slots as used, and make that region available > > through a special CMA mechanism as ZONE_MOVABLE (but not allowing > > other CMA allocations to dip into it). > > > > This allows us to have a single slot management for the entire > > area, but allow reclaiming from it. We'd probably also need to make > > this CMA variant irq safe. > > I think this could work. It doesn't need to be ZONE_MOVABLE (and we > actually need this buffer in ZONE_DMA). But we can introduce a new > migrate type, MIGRATE_SWIOTLB, and movable page allocations can use this > range. The CMA allocations go to free_list[MIGRATE_CMA], so they won't > overlap. > > One of the downsides is that migrating movable pages still needs a > sleep-able context. Pages can be migrated by a separate worker thread when the number of free slots reaches a low watermark. > Another potential confusion is is_swiotlb_buffer() for pages in this > range allocated through the normal page allocator. We may need to check > the slots as well rather than just the buffer boundaries. Ah, yes, I forgot about this part; thanks for the reminder. Indeed, movable pages can be used for the page cache, and drivers do DMA to/from buffers in the page cache. Let me recap: - Allocated chunks must still be tracked with this approach. - The pool of available slots cannot be grown from interrupt context. So, what exactly is the advantage compared to allocating additional swiotlb chunks from CMA? > (we are actually looking at a MIGRATE_METADATA type for the arm64 memory > tagging extension which uses a 3% carveout of the RAM for storing the > tags and people want that reused somehow; we have some WIP patches but > we'll post them later this summer) > > > This could still be combined with more aggressive use of per-device > > swiotlb area, which is probably a good idea based on some hints. > > E.g. device could hint an amount of inflight DMA to the DMA layer, > > and if there are addressing limitations and the amout is large enough > > that could cause the allocation of a per-device swiotlb area. > > If we go for one large-ish per-device buffer for specific cases, maybe > something similar to the rmem_swiotlb_setup() but which can be > dynamically allocated at run-time and may live alongside the default > swiotlb. The advantage is that it uses a similar slot tracking to the > default swiotlb, no need to invent another. This per-device buffer could > also be allocated from the MIGRATE_SWIOTLB range if we make it large > enough at boot. It would be seen just a local accelerator for devices > that use bouncing frequently or from irq context. A per-device pool could also be used for small buffers. IIRC somebody was interested in that. Petr T