Re: [PATCH v3 7/7] iommu/riscv: Paging domain support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 7, 2024 at 9:51 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Mon, May 06, 2024 at 07:22:07PM -0700, Tomasz Jeznach wrote:
> > On Sun, May 5, 2024 at 8:46 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > >
> > > On Fri, May 03, 2024 at 12:44:09PM -0700, Tomasz Jeznach wrote:
> > > > > For detach I think yes:
> > > > >
> > > > >    Inv CPU                                   Detach CPU
> > > > >
> > > > >   write io_pte                               Update device descriptor
> > > > >   rcu_read_lock
> > > > >     list_for_each
> > > > >       <make invalidation command>            <make description inv cmd>
> > > > >       dma_wmb()                              dma_wmb()
> > > > >       <doorbell>                             <cmd doorbell>
> > > > >   rcu_read_unlock
> > > > >                                              list_del_rcu()
> > > > >                                              <wipe ASID>
> > > > >
> > > > > In this case I think we never miss an invalidation, the list_del is
> > > > > always after the HW has been fully fenced, so I don't think we can
> > > > > have any issue. Maybe a suprious invalidation if the ASID gets
> > > > > re-used, but who cares.
> > > > >
> > > > > Attach is different..
> > > > >
> > > > >    Inv CPU                                   Attach CPU
> > > > >
> > > > >   write io_pte
> > > > >   rcu_read_lock
> > > > >     list_for_each // empty
> > > > >                                              list_add_rcu()
> > > > >                                              Update device descriptor
> > > > >                                              <make description inv cmd>
> > > > >                                              dma_wmb()
> > > > >                                              <cmd doorbell>
> > > > >   rcu_read_unlock
> > > > >
> > > > > As above shows we can "miss" an invalidation. The issue is narrow, the
> > > > > io_pte could still be sitting in write buffers in "Inv CPU" and not
> > > > > yet globally visiable. "Attach CPU" could get the device descriptor
> > > > > installed in the IOMMU and the IOMMU could walk an io_pte that is in
> > > > > the old state. Effectively this is because there is no release/acquire
> > > > > barrier passing the io_pte store from the Inv CPU to the Attach CPU to the
> > > > > IOMMU.
> > > > >
> > > > > It seems like it should be solvable somehow:
> > > > >  1) Inv CPU releases all the io ptes
> > > > >  2) Attach CPU acquires the io ptes before updating the DDT
> > > > >  3) Inv CPU acquires the RCU list in such a way that either attach
> > > > >     CPU will acquire the io_pte or inv CPU will acquire the RCU list.
> > > > >  4) Either invalidation works or we release the new iopte to the SMMU
> > > > >     and don't need it.
> > > > >
> > > > > But #3 is a really weird statement. smb_mb() on both sides may do the
> > > > > job??
> > > > >
> > > >
> > > > Actual attach sequence is slightly different.
> > > >
> > > >  Inv CPU                            Attach CPU
> > > >
> > > >  write io_pte
> > > >   rcu_read_lock
> > > >     list_for_each // empty
> > > >                                     list_add_rcu()
> > > >                                     IOTLB.INVAL(PSCID)
> > > >                                     <make description inv cmd>
> > > >                                     dma_wmb()
> > > >                                     <cmd doorbell>
> > > >  rcu_read_unlock
> > > >
> > > > I've tried to cover this case with riscv_iommu_iotlb_inval() called
> > > > before the attached domain is visible to the device.
> > >
> > > That invalidation shouldn't do anything. If this is the first attach
> > > of a PSCID then the PSCID had better already be empty, it won't become
> > > non-empty until the DDT entry is installed.
> > >
> > > And if it is the second attach then the Inv CPU is already taking care
> > > of things, no need to invalidate at all.
> > >
> > > Regardless, there is still a theortical race that the IOPTEs haven't
> > > been made visible yet because there is still no synchronization with
> > > the CPU writing them.
> > >
> > > So, I don't think this solves any problem. I belive you need the
> > > appropriate kind of CPU barrier here instead of an invalidation.
> > >
> >
> > Yes. There was a small, but still plausible race w/ IOPTEs visibility
> > to the IOMMU.
> > For v5 I'm adding two barriers to the inval/detach flow, I believe
> > should cover it.
> >
> > 1) In riscv_iommu_iotlb_inval() unconditional dma_wmb() to make any
> > pending writes to PTEs visible to the IOMMU device. This should cover
> > the case when list_add_rcu() update is not yet visible in the
> > _iotlb_inval() sequence, for the first time the domain is attached to
> > the IOMMU.
> >
> >   Inv CPU                                    Attach CPU
> >   write io_pte
> >   dma_wmb (1)
> >   rcu_read_lock
> >     list_for_each // empty                   list_add_rcu()
> >                                              smp_wmb (2)
> >                                              Update device descriptor
> >                                              <make description inv cmd>
> >                                              // PTEs are visible to the HW (*1)
> >                                              dma_wmb()
> >                                              <cmd doorbell>
> >   rcu_read_unlock
> >
> > 2) In riscv_iommu_bond_link() write memory barrier to ensure list
> > update is visible before IOMMU descriptor update. If stale data has
> > been fetched by the HW, inval CPU will run iotlb-invalidation
> > sequence. There is a possibility that IOMMU will fetch correct PTEs
> > and will receive unnecessary IOTLB inval, but I don't think anyone
> > would care.
> >
> >   Inv CPU                                    Attach CPU
> >   write io_pte                               list_add_rcu()
> >                                              smp_wmb (2)
> >                                              Update device descriptor
> >                                              <make description inv cmd>
> >                                              // HW might fetch stale PTEs
> >                                              dma_wmb()
> >                                              <cmd doorbell>
> >   dma_wmb (1)
> >   rcu_read_lock
> >     list_for_each // non-empty (*2)
> >       <make iotlb inval cmd>
> >       dma_wmb()
> >       <cmd doorbell>
> >   rcu_read_unlock
> >
> > 3) I've also updated riscv_iommu_bond_unlink() to wipe the PSCID cache
> > on the last domain unlink from the IOMMU.
> >
> > Thank you for pointing this out. Let me know if that makes sense.
>
> I'm not an expert in barriers, but I think you need the more expensive
> "mb" in both cases.
>
> The inv side is both releasing the write and acquiring the list
> read. IIRC READ_ONCE is not a full acquire?
>
> The Attach side is both releasing the list_add_rcu() and acquiring the
> iopte.
>
> rcu is still a benefit, there is no cache line sharing and there is
> only one full barrier, not two, like a spinlock.
>
> And a big fat comment in both sides explaining this :)
>

I'm not an expert in barriers as well, but I've checked offline with one ;)

And the conclusion is that we need FENCE W,W (or stronger) on the
attach side, and FENCE RW,RW in the invalidation sequence.  Hardware
access to PTE/DC is sequential, with implied FENCE R,R barriers.

As 'attach' sequence is a rare event anyway, I'll go on "mb" on both
sides, as suggested.
Unless there are other opinions on that I'll update barriers to match
this conclusion and try to capture in long comment for bond / inval /
attach synchronization assumptions.

Jason, thanks again for pointing this out.


> Jason

Best,
- Tomasz





[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux