On Wed, 2022-07-20 at 17:04 -0600, Alex Williamson wrote: > On Wed, 20 Jul 2022 17:08:29 -0300 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > On Wed, Jul 20, 2022 at 01:41:13PM -0600, Alex Williamson wrote: > > > > > ie. we don't need the gfn, we only need the iova. > > > > Right, that makes sense > > > > > However then I start to wonder why we're passing in 1 for the > > > number of > > > pages because this previously notifier, now callback is called > > > for the > > > entire vfio_dma range when we find any pinned pages. > > > > Well, it is doing this because it only ever pins one page. > > Of course that page is not necessarily the page it unpins given the > contract misunderstanding below. > > > The drivers are confused about what the contract is. vfio is > > calling > > the notifier with the entire IOVA range that is being unmapped and > > the > > drivers are expecting to receive notifications only for the IOVA > > they > > have actually pinned. > > > > > Should ap and ccw implementations of .dma_unmap just be replaced > > > with a > > > BUG_ON(1)? > > > > The point of these callbacks is to halt concurrent DMA, and ccw > > does > > that today. > > ccw essentially only checks whether the starting iova of the unmap is > currently mapped. If not it does nothing, if it is it tries to reset > the device and unpin everything. Chances are the first iova is not > the > one pinned, so we don't end up removing the pinned page and type1 > will > eventually BUG_ON after a few tries. > > > It looks like AP is missing a call to ap_aqic(), so it is > > probably double wrong. > > Thankfully the type1 unpinning path can't be tricked into unpinning > something that wasn't pinned, so chances are the unpin call does > nothing, with a small risk that it unpins another driver's pinned > page, > which might not yet have been notified and could still be using the > page. In the end, if ap did have a page pinned in the range, we'll > hit > the same BUG_ON as above. > > > What I'd suggest is adding a WARN_ON that the dma->pfn_list is not > > empty and leave these functions alone. > > The BUG_ON still exists in type1. > > Eric, Matt, Tony, Halil, JasonH, any quick fixes here? ccw looks > like > it would be pretty straightforward to test against a range rather > than > a single iova. Agreed, ccw looks pretty easy. Should I send something to go before this series to make stable easier? (It's a trivial change in either direction, so either is fine to me.) Eric > > > Most likely AP should be fixed to call vfio_ap_irq_disable() and to > > check the q->saved_pfn against the IOVA. > > Right, the q->saved_iova, perhaps calling vfio_ap_irq_disable() on > finding a matching queue. > > > But I'm inclined to leave this as-is for this series given we are > > at > > rc7. > > On the grounds that it's no worse, maybe, but given the changes > around this code hopefully we can submit fixes patches to stable if > the > backport isn't obvious and the BUG_ON in type1 is reachable. Thanks, > > Alex >