Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers

Petr Tesařík <petr@xxxxxxxxxxx> · Wed, 17 May 2023 09:32:26 +0200

Hi Christoph,

On Wed, 17 May 2023 08:56:53 +0200
Christoph Hellwig <hch@xxxxxx> wrote:

> Just thinking out loud:
> 
>  - what if we always way overallocate the swiotlb buffer
>  - and then mark the second half / two thirds / <pull some number out
>    of the thin air> slots as used, and make that region available
>    through a special CMA mechanism as ZONE_MOVABLE (but not allowing
>    other CMA allocations to dip into it).

This approach has also been considered internally at Huawei, and it
looked like a viable option, just more complex. We decided to send the
simple approach first to get some feedback and find out who else might
be interested in the dynamic sizing of swiotlb (if anyone).

> This allows us to have a single slot management for the entire
> area, but allow reclaiming from it.  We'd probably also need to make
> this CMA variant irq safe.

Let me recap my internal analysis.

On the pro side:

- no performance penalty for devices that do not use swiotlb
- all alignment and boundary constraints can be met
- efficient use of memory for buffers smaller than 1 page

On the con side:

- ZONE_MOVABLE cannot be used for most kernel allocations
- competition with CMA over precious physical address space
  (How much should be reserved for CMA and how much for SWIOTLB?)

To quote from Memory hotplug documentation:

Usually, MOVABLE:KERNEL ratios of up to 3:1 or even 4:1 are fine. [...]
Actual safe zone ratios depend on the workload. Extreme cases, like
excessive long-term pinning of pages, might not be able to deal with
ZONE_MOVABLE at all.

This should be no big issue on bare metal (where the motivation is
addressing limitations), but the size of SWIOTLB in CoCo VMs probably
needs some consideration.

> This could still be combined with more aggressive use of per-device
> swiotlb area, which is probably a good idea based on some hints.
> E.g. device could hint an amount of inflight DMA to the DMA layer,
> and if there are addressing limitations and the amout is large enough
> that could cause the allocation of a per-device swiotlb area.

I would not rely on device hints, because it probably depends on
workload rather than type of device. I'd rather implement some logic
based on the actual runtime usage pattern. I have some ideas already.

Petr T