On Sun, May 5, 2024 at 8:46 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Fri, May 03, 2024 at 12:44:09PM -0700, Tomasz Jeznach wrote: > > > For detach I think yes: > > > > > > Inv CPU Detach CPU > > > > > > write io_pte Update device descriptor > > > rcu_read_lock > > > list_for_each > > > <make invalidation command> <make description inv cmd> > > > dma_wmb() dma_wmb() > > > <doorbell> <cmd doorbell> > > > rcu_read_unlock > > > list_del_rcu() > > > <wipe ASID> > > > > > > In this case I think we never miss an invalidation, the list_del is > > > always after the HW has been fully fenced, so I don't think we can > > > have any issue. Maybe a suprious invalidation if the ASID gets > > > re-used, but who cares. > > > > > > Attach is different.. > > > > > > Inv CPU Attach CPU > > > > > > write io_pte > > > rcu_read_lock > > > list_for_each // empty > > > list_add_rcu() > > > Update device descriptor > > > <make description inv cmd> > > > dma_wmb() > > > <cmd doorbell> > > > rcu_read_unlock > > > > > > As above shows we can "miss" an invalidation. The issue is narrow, the > > > io_pte could still be sitting in write buffers in "Inv CPU" and not > > > yet globally visiable. "Attach CPU" could get the device descriptor > > > installed in the IOMMU and the IOMMU could walk an io_pte that is in > > > the old state. Effectively this is because there is no release/acquire > > > barrier passing the io_pte store from the Inv CPU to the Attach CPU to the > > > IOMMU. > > > > > > It seems like it should be solvable somehow: > > > 1) Inv CPU releases all the io ptes > > > 2) Attach CPU acquires the io ptes before updating the DDT > > > 3) Inv CPU acquires the RCU list in such a way that either attach > > > CPU will acquire the io_pte or inv CPU will acquire the RCU list. > > > 4) Either invalidation works or we release the new iopte to the SMMU > > > and don't need it. > > > > > > But #3 is a really weird statement. smb_mb() on both sides may do the > > > job?? > > > > > > > Actual attach sequence is slightly different. > > > > Inv CPU Attach CPU > > > > write io_pte > > rcu_read_lock > > list_for_each // empty > > list_add_rcu() > > IOTLB.INVAL(PSCID) > > <make description inv cmd> > > dma_wmb() > > <cmd doorbell> > > rcu_read_unlock > > > > I've tried to cover this case with riscv_iommu_iotlb_inval() called > > before the attached domain is visible to the device. > > That invalidation shouldn't do anything. If this is the first attach > of a PSCID then the PSCID had better already be empty, it won't become > non-empty until the DDT entry is installed. > > And if it is the second attach then the Inv CPU is already taking care > of things, no need to invalidate at all. > > Regardless, there is still a theortical race that the IOPTEs haven't > been made visible yet because there is still no synchronization with > the CPU writing them. > > So, I don't think this solves any problem. I belive you need the > appropriate kind of CPU barrier here instead of an invalidation. > Yes. There was a small, but still plausible race w/ IOPTEs visibility to the IOMMU. For v5 I'm adding two barriers to the inval/detach flow, I believe should cover it. 1) In riscv_iommu_iotlb_inval() unconditional dma_wmb() to make any pending writes to PTEs visible to the IOMMU device. This should cover the case when list_add_rcu() update is not yet visible in the _iotlb_inval() sequence, for the first time the domain is attached to the IOMMU. Inv CPU Attach CPU write io_pte dma_wmb (1) rcu_read_lock list_for_each // empty list_add_rcu() smp_wmb (2) Update device descriptor <make description inv cmd> // PTEs are visible to the HW (*1) dma_wmb() <cmd doorbell> rcu_read_unlock 2) In riscv_iommu_bond_link() write memory barrier to ensure list update is visible before IOMMU descriptor update. If stale data has been fetched by the HW, inval CPU will run iotlb-invalidation sequence. There is a possibility that IOMMU will fetch correct PTEs and will receive unnecessary IOTLB inval, but I don't think anyone would care. Inv CPU Attach CPU write io_pte list_add_rcu() smp_wmb (2) Update device descriptor <make description inv cmd> // HW might fetch stale PTEs dma_wmb() <cmd doorbell> dma_wmb (1) rcu_read_lock list_for_each // non-empty (*2) <make iotlb inval cmd> dma_wmb() <cmd doorbell> rcu_read_unlock 3) I've also updated riscv_iommu_bond_unlink() to wipe the PSCID cache on the last domain unlink from the IOMMU. Thank you for pointing this out. Let me know if that makes sense. Best, - Tomasz > Jason