On Tue, May 09, 2023 at 11:18:19AM +0200, Petr Tesarik wrote: > diff --git a/include/linux/device.h b/include/linux/device.h > index d1d2b8557b30..e340e0f06dce 100644 > --- a/include/linux/device.h > +++ b/include/linux/device.h > @@ -516,6 +516,9 @@ struct device_physical_location { > * @dma_io_tlb_dyn_slots: > * Dynamically allocated bounce buffers for this device. > * Not for driver use. > + * @dma_io_tlb_have_dyn: > + * Does this device have any dynamically allocated bounce > + * buffers? Not for driver use. > * @archdata: For arch-specific additions. > * @of_node: Associated device tree node. > * @fwnode: Associated device node supplied by platform firmware. > @@ -623,6 +626,7 @@ struct device { > struct io_tlb_mem *dma_io_tlb_mem; > spinlock_t dma_io_tlb_dyn_lock; > struct list_head dma_io_tlb_dyn_slots; > + bool dma_io_tlb_have_dyn; > #endif > /* arch specific additions */ > struct dev_archdata archdata; > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > index daa2064f2ede..8cbb0bebb0bc 100644 > --- a/include/linux/swiotlb.h > +++ b/include/linux/swiotlb.h > @@ -152,7 +152,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) > > return mem && > (is_swiotlb_fixed(mem, paddr) || > - (mem->allow_dyn && is_swiotlb_dyn(dev, paddr))); > + /* Pairs with smp_store_release() in swiotlb_dyn_map() > + * and swiotlb_dyn_unmap(). > + */ > + (smp_load_acquire(&dev->dma_io_tlb_have_dyn) && > + is_swiotlb_dyn(dev, paddr))); > } > > static inline bool is_swiotlb_force_bounce(struct device *dev) > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 81eab1c72c50..e8be3ee50f18 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -642,6 +642,9 @@ static phys_addr_t swiotlb_dyn_map(struct device *dev, phys_addr_t orig_addr, > > spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags); > list_add(&slot->node, &dev->dma_io_tlb_dyn_slots); > + if (!dev->dma_io_tlb_have_dyn) > + /* Pairs with smp_load_acquire() in is_swiotlb_buffer() */ > + smp_store_release(&dev->dma_io_tlb_have_dyn, true); > spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags); I'm not sure this works. What this seems to do is that if the caller of is_swiotlb_buffer() sees the flag set, it's guaranteed that something was added to the dma_io_tlb_dyn_slots list. But the reverse is not necessarily true. IOW, if something was added to the list, there is a brief window where the dma_io_tlb_have_dyn flag is still false. In the general case, I doubt any ordering between list_add() and the flag setting changes anything since neither of them may be visible to another CPU. What you need is for a 'paddr' added to the dynamic list to be correctly identified by another CPU as dynamic swiotlb. That other CPU doesn't check random addresses but only those returned by the DMA API. Such values would be normally passed through a memory location (e.g. driver local structures) and that's what you want to order against. What I mean is that a 'dev->blah = paddr' needs to be ordered _after_ your flag setting. So you need the either the 'blah = paddr' assignment to have release semantics or the flag setting to be an smp_store_acquire() (but we don't have such thing). You'd have to use an explicit smp_wmb() barrier after the flag setting (it can be outside the lock). The spin_unlock() is not sufficient since it only has release semantics. I also don't think the ordering between list_add() and flag setting matters since the smp_wmb() would ensure that both are visible when the 'paddr' value made it to the other CPU. Similarly on the is_swiotlb_buffer() side, you want the flag reading to be ordered after the 'blah = paddr' is observed. Here the smp_load_acquire() is sufficient. > return page_to_phys(slot->page); > @@ -668,6 +671,9 @@ static void swiotlb_dyn_unmap(struct device *dev, phys_addr_t tlb_addr, > unsigned long flags; > > spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags); > + if (list_is_singular(&dev->dma_io_tlb_dyn_slots)) > + /* Pairs with smp_load_acquire() in is_swiotlb_buffer() */ > + smp_store_release(&dev->dma_io_tlb_have_dyn, false); > slot = lookup_dyn_slot_locked(dev, tlb_addr); > list_del(&slot->node); > spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags); As with the map case, I don't think the ordering between list_del() and the flag setting matters. If you unmap the last dynamic buffer, the worst that can happen is that an is_swiotlb_buffer() call attempts a read of the list but the flag will eventually become visible. There shouldn't be another caller trying to unmap the same paddr (in well behaved drivers). Now, thinking about the list_head access and the flag ordering, since it doesn't matter, you might as well not bother with the flag at all and rely on list_add() and list_empty() ordering vs the hypothetical 'blah' access. Both of these use READ/WRITE_ONCE() for setting dma_io_tlb_dyn_slots.next. You only need an smp_wmb() after the list_add() and an smp_rmb() before a list_empty() check in is_swiotlb_buffer(), no dma_iotlb_have_dyn variable. That's my reasoning but to be absolutely sure, you can pass that through some formal modelling. -- Catalin