On Thu, 09 Nov 2023 13:24:48 +0100 Niklas Schnelle <schnelle@xxxxxxxxxxxxx> wrote: > On Wed, 2023-11-08 at 13:21 +0100, Petr Tesařík wrote: > > On Wed, 8 Nov 2023 12:12:49 +0100 > > Petr Tesarik <petrtesarik@xxxxxxxxxxxxxxx> wrote: > > > > > From: Petr Tesarik <petr.tesarik1@xxxxxxxxxxxxxxxxxxx> > > > > > > Limit the free list length to the size of the IO TLB. Transient pool can be > > > smaller than IO_TLB_SEGSIZE, but the free list is initialized with the > > > assumption that the total number of slots is a multiple of IO_TLB_SEGSIZE. > > > As a result, swiotlb_area_find_slots() may allocate slots past the end of > > > a transient IO TLB buffer. > > > > Just to make it clear, this patch addresses only the memory corruption > > reported by Niklas, without addressing the underlying issues. Where > > corruption happened before, allocations will fail with this patch. > > > > I am still looking into improving the allocation strategy itself. > > > > Petr T > > I know this has already been applied but for what its worth I did > finally manage to test this with my reproducer and the allocation > overrun is fixed by this change. I also confirmed that at least my > ConnectX VF TCP/IP test case seems to handle the DMA error gracefully > enough. Thank you for testing! Inded, the failed request is often retried at a later time. For example I tested with a SCSI driver, and by the time the SCSI layer retried the request, a new standard pool was already available. But this situation is not ideal. If nothing else, it incurs an unnecessary delay. Petr T