Hi Catalin, On Sun, 14 May 2023 19:54:27 +0100 Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > On Tue, May 09, 2023 at 11:18:19AM +0200, Petr Tesarik wrote: > > diff --git a/include/linux/device.h b/include/linux/device.h > > index d1d2b8557b30..e340e0f06dce 100644 > > --- a/include/linux/device.h > > +++ b/include/linux/device.h > > @@ -516,6 +516,9 @@ struct device_physical_location { > > * @dma_io_tlb_dyn_slots: > > * Dynamically allocated bounce buffers for this device. > > * Not for driver use. > > + * @dma_io_tlb_have_dyn: > > + * Does this device have any dynamically allocated bounce > > + * buffers? Not for driver use. > > * @archdata: For arch-specific additions. > > * @of_node: Associated device tree node. > > * @fwnode: Associated device node supplied by platform firmware. > > @@ -623,6 +626,7 @@ struct device { > > struct io_tlb_mem *dma_io_tlb_mem; > > spinlock_t dma_io_tlb_dyn_lock; > > struct list_head dma_io_tlb_dyn_slots; > > + bool dma_io_tlb_have_dyn; > > #endif > > /* arch specific additions */ > > struct dev_archdata archdata; > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > > index daa2064f2ede..8cbb0bebb0bc 100644 > > --- a/include/linux/swiotlb.h > > +++ b/include/linux/swiotlb.h > > @@ -152,7 +152,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) > > > > return mem && > > (is_swiotlb_fixed(mem, paddr) || > > - (mem->allow_dyn && is_swiotlb_dyn(dev, paddr))); > > + /* Pairs with smp_store_release() in swiotlb_dyn_map() > > + * and swiotlb_dyn_unmap(). > > + */ > > + (smp_load_acquire(&dev->dma_io_tlb_have_dyn) && > > + is_swiotlb_dyn(dev, paddr))); > > } > > > > static inline bool is_swiotlb_force_bounce(struct device *dev) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > > index 81eab1c72c50..e8be3ee50f18 100644 > > --- a/kernel/dma/swiotlb.c > > +++ b/kernel/dma/swiotlb.c > > @@ -642,6 +642,9 @@ static phys_addr_t swiotlb_dyn_map(struct device *dev, phys_addr_t orig_addr, > > > > spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags); > > list_add(&slot->node, &dev->dma_io_tlb_dyn_slots); > > + if (!dev->dma_io_tlb_have_dyn) > > + /* Pairs with smp_load_acquire() in is_swiotlb_buffer() */ > > + smp_store_release(&dev->dma_io_tlb_have_dyn, true); > > spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags); > > I'm not sure this works. What this seems to do is that if the caller of > is_swiotlb_buffer() sees the flag set, it's guaranteed that something > was added to the dma_io_tlb_dyn_slots list. But the reverse is not > necessarily true. IOW, if something was added to the list, there is a > brief window where the dma_io_tlb_have_dyn flag is still false. In the > general case, I doubt any ordering between list_add() and the flag > setting changes anything since neither of them may be visible to another > CPU. Thank you for the review! This patch probably needs a bit more explanation. The goal is to avoid taking a spin lock in the mkost common case that the dynamic list is empty. The only required invariant is: When the flag is clear, it is safe to skip the list. It's not a bug to walk an empty list, it's merely less efficient. Such race window would be acceptable. OTOH that's not your concern if I understand you correctly. > What you need is for a 'paddr' added to the dynamic list to be correctly > identified by another CPU as dynamic swiotlb. That other CPU doesn't > check random addresses but only those returned by the DMA API. Yes, that's correct. > Such > values would be normally passed through a memory location (e.g. driver > local structures) and that's what you want to order against. This would certainly work, but I'm not sure I need it. My only goal is that when the flag is set, the new value is observed by all CPUs on the next call to is_swiotlb_buffer(). > What I mean is that a 'dev->blah = paddr' needs to be ordered _after_ > your flag setting. So you need the either the 'blah = paddr' assignment > to have release semantics or the flag setting to be an > smp_store_acquire() (but we don't have such thing). You'd have to use an > explicit smp_wmb() barrier after the flag setting (it can be outside the > lock). The spin_unlock() is not sufficient since it only has release > semantics. Understood. The spinlock is supposed to order changes to the list, not to the flag. > I also don't think the ordering between list_add() and flag > setting matters since the smp_wmb() would ensure that both are visible > when the 'paddr' value made it to the other CPU. If the flag makes it before the list, then the other CPU will walk the list only after acquiring dma_io_tlb_dyn_lock, and that's what the list is ordered against. I don't think there is any concern if the list makes it before the flag, as long as the new value of the flag is observed on the next call to is_swiotlb_buffer (on any CPU). > Similarly on the is_swiotlb_buffer() side, you want the flag reading to > be ordered after the 'blah = paddr' is observed. Here the > smp_load_acquire() is sufficient. > > > return page_to_phys(slot->page); > > @@ -668,6 +671,9 @@ static void swiotlb_dyn_unmap(struct device *dev, phys_addr_t tlb_addr, > > unsigned long flags; > > > > spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags); > > + if (list_is_singular(&dev->dma_io_tlb_dyn_slots)) > > + /* Pairs with smp_load_acquire() in is_swiotlb_buffer() */ > > + smp_store_release(&dev->dma_io_tlb_have_dyn, false); > > slot = lookup_dyn_slot_locked(dev, tlb_addr); > > list_del(&slot->node); > > spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags); > > As with the map case, I don't think the ordering between list_del() and > the flag setting matters. If you unmap the last dynamic buffer, the > worst that can happen is that an is_swiotlb_buffer() call attempts a > read of the list but the flag will eventually become visible. There > shouldn't be another caller trying to unmap the same paddr (in well > behaved drivers). > > Now, thinking about the list_head access and the flag ordering, since it > doesn't matter, you might as well not bother with the flag at all and > rely on list_add() and list_empty() ordering vs the hypothetical 'blah' > access. Both of these use READ/WRITE_ONCE() for setting > dma_io_tlb_dyn_slots.next. You only need an smp_wmb() after the > list_add() and an smp_rmb() before a list_empty() check in > is_swiotlb_buffer(), no dma_iotlb_have_dyn variable. Wait, let me check that I understand you right. Do you suggest that I convert dma_io_tlb_dyn_slots to a lockless list and get rid of the spinlock? I'm sure it can be done for list_add() and list_del(). I'll have to think about list_move(). > That's my reasoning but to I'll have be absolutely sure, you can pass that through > some formal modelling. Good idea. Especially if I try to get rid of the lock. Petr T