On Wed, 28 Aug 2024 16:30:04 +0000 Michael Kelley <mhklinux@xxxxxxxxxxx> wrote: > From: Petr Tesařík <petr@xxxxxxxxxxx> Sent: Wednesday, August 28, 2024 6:04 AM > > > > On Wed, 28 Aug 2024 13:02:31 +0100 > > Robin Murphy <robin.murphy@xxxxxxx> wrote: > > > > > On 2024-08-22 7:37 pm, mhkelley58@xxxxxxxxx wrote: > > > > From: Michael Kelley <mhklinux@xxxxxxxxxxx> > > > > > > > > Background > > > > ========== > > > > Linux device drivers may make DMA map/unmap calls in contexts that > > > > cannot block, such as in an interrupt handler. Consequently, when a > > > > DMA map call must use a bounce buffer, the allocation of swiotlb > > > > memory must always succeed immediately. If swiotlb memory is > > > > exhausted, the DMA map call cannot wait for memory to be released. The > > > > call fails, which usually results in an I/O error. > > > > > > > > Bounce buffers are usually used infrequently for a few corner cases, > > > > so the default swiotlb memory allocation of 64 MiB is more than > > > > sufficient to avoid running out and causing errors. However, recently > > > > introduced Confidential Computing (CoCo) VMs must use bounce buffers > > > > for all DMA I/O because the VM's memory is encrypted. In CoCo VMs > > > > a new heuristic allocates ~6% of the VM's memory, up to 1 GiB, for > > > > swiotlb memory. This large allocation reduces the likelihood of a > > > > spike in usage causing DMA map failures. Unfortunately for most > > > > workloads, this insurance against spikes comes at the cost of > > > > potentially "wasting" hundreds of MiB's of the VM's memory, as swiotlb > > > > memory can't be used for other purposes. > > > > > > > > Approach > > > > ======== > > > > The goal is to significantly reduce the amount of memory reserved as > > > > swiotlb memory in CoCo VMs, while not unduly increasing the risk of > > > > DMA map failures due to memory exhaustion. > > > > > > Isn't that fundamentally the same thing that SWIOTLB_DYNAMIC was already > > > meant to address? Of course the implementation of that is still young > > > and has plenty of scope to be made more effective, and some of the ideas > > > here could very much help with that, but I'm struggling a little to see > > > what's really beneficial about having a completely disjoint mechanism > > > for sitting around doing nothing in the precise circumstances where it > > > would seem most possible to allocate a transient buffer and get on with it. > > > > This question can be probably best answered by Michael, but let me give > > my understanding of the differences. First the similarity: Yes, one > > of the key new concepts is that swiotlb allocation may block, and I > > introduced a similar attribute in one of my dynamic SWIOTLB patches; it > > was later dropped, but dynamic SWIOTLB would still benefit from it. > > > > More importantly, dynamic SWIOTLB may deplete memory following an I/O > > spike. I do have some ideas how memory could be returned back to the > > allocator, but the code is not ready (unlike this patch series). > > Moreover, it may still be a better idea to throttle the devices > > instead, because returning DMA'able memory is not always cheap. In a > > CoCo VM, this memory must be re-encrypted, and that requires a > > hypercall that I'm told is expensive. > > > > In short, IIUC it is faster in a CoCo VM to delay some requests a bit > > than to grow the swiotlb. > > > > Michael, please add your insights. > > > > Petr T > > > > The other limitation of SWIOTLB_DYNAMIC is that growing swiotlb > memory requires large chunks of physically contiguous memory, > which may be impossible to get after a system has been running a > while. With a major rework of swiotlb memory allocation code, it might > be possible to get by with a piecewise assembly of smaller contiguous > memory chunks, but getting many smaller chunks could also be > challenging. > > Growing swiotlb memory also must be done as a background async > operation if the DMA map operation can't block. So transient buffers > are needed, which must be encrypted and decrypted on every round > trip in a CoCo VM. The transient buffer memory comes from the > atomic pool, which typically isn't that large and could itself become > exhausted. So we're somewhat playing whack-a-mole on the memory > allocation problem. Note that this situation can be somewhat improved with the SWIOTLB_ATTR_MAY_BLOCK flag, because a new SWIOTLB chunk can then be allocated immediately, removing the need to allocate a transient pool from the atomic pool. > We discussed the limitations of SWIOTLB_DYNAMIC in large CoCo VMs > at the time SWIOTLB_DYNAMIC was being developed, and I think there > was general agreement that throttling would be better for the CoCo > VM scenario. > > Broadly, throttling DMA map requests seems like a fundamentally more > robust approach than growing swiotlb memory. And starting down > the path of allowing designated DMA map requests to block might have > broader benefits as well, perhaps on the IOMMU path. > > These points are all arguable, and your point about having two somewhat > overlapping mechanisms is valid. Between the two, my personal viewpoint > is that throttling is the better approach, but I'm probably biased by my > background in the CoCo VM world. Petr and others may see the tradeoffs > differently. For CoCo VMs, throttling indeed seems to be better. Embedded devices seem to benefit more from growing the swiotlb on demand. As usual, YMMV. Petr T