On Fri, 03 Nov 2023 16:13:03 +0100 Niklas Schnelle <schnelle@xxxxxxxxxxxxx> wrote: > The reason for 1) is a bit more convoluted and not entirely understood > by us. We are certain though that the function swiotlb_find_slots() > allocates a pool with nr_slots(alloc_size), where this alloc_size is > the alloc_size from swiotlb_tbl_map_single() + swiotlb_align_offset(), > but for alignment reasons some slots may get "skipped over" in > swiotlb_area_find_slots() causing the picked slots to overrun the > allocation. > > Not sure how to properly fix this as the different alignment > requirements get pretty complex quickly. So would appreciate your > input. Let me post a more detailed analysis of why do we observe swiotlb_area_find_slots() considering the slot with the index 0 invalid in our particular case, and how does that relate to the whole "alignment" complex. Currently there are three distinct mechanisms that dictate the "alignment": a) min_align_mask (introduced by 36950f2da1ea ("driver core: add a min_align_mask)) field to struct device_dma_parameters")) b) alloc_size >= PAGE_SIZE which requires "page alignment" c) alloc_aligned_mask. In our case min_align_mask == 0 and a) is thus not applicable, because b) and c) we end up with iotlb_align_mask = 0x800. And because orig_add & 0x800 == 0x800 but pool->start & 0x800 == 0 and the slot at index i is skipped over. The slot 0 is skipped over because it is page aligned, when !!((1UL << PAGE_SHIFT) & orig_addr) Let us note that with the current implementation the min_align_size mask, that is mechanism a) also controls the tlb_addr within the first slot so that: tlb_addr & min_align_mask == orig_addr & min_align_mask. In that sense a) is very unlike b) and c). For example, if !min_align_mask, then tlb_addr & (IO_TLB_SIZE - 1) is always 0, even if the alloc_size is >= PAGE_SIZE or if alloc_aligned_size is non 0. If with b) and c) the goal is that the swiotlb buffer shall not stretch over more pages or address space blocks of a size dictated by alloc_aligned_mask then, that goal is accomplished. If however the goal is to preserve the offsets modulo some exponent of 2 dictated either by PAGE_SHIFT or by alloc_aligned mask, then that goal is not reached. But there is more to it! In the beginning there was b), or more precisely in the olden days for mapping_size >= PAGE_SIZE we used to allocate properly page aligned bounce buffers. That is tlb_addr & (~PAGE_MASK) == 0 regardless of what orig_addr & (~PAGE_MASK) & (IO_TLB_SIZE - 1) is. That first got borked by commit 1f221a0d0dbf ("swiotlb: respect min_align_mask") and then it did not get fixed by commit 0eee5ae10256 ("swiotlb: fix slot alignment checks"). Let us also note that if more than one of the above mechanisms are applicable, then for the slot selection the idea is apparently to go with the strictest "alignment requirement", while for the offset within the slot only a) matters (if applicable, i.e. min_align_mask != 0), which may appear strange if not thoroughly documented. In our opinion the first step towards getting this right is to figure out what the different kinds of alignments are really supposed to mean. For each of the mechanisms we need to understand and document, whether making sure that the bounce buffer does not stretch over more of certain units of memory (like, pages, iova granule size, whatever), or is it about preserving offset within a certain unit of memory, and if yes to what extent (the least significant n-bits of the orig_addr dictated by the respective mask, or something different). Thank you for your help in advance! Regards, Halil and Niklas