Re: [PATCH v5 08/14] iommufd/viommu: Add iommufd_viommu_report_event helper

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 21 Jan 2025 17:14:04 -0400

On Tue, Jan 21, 2025 at 01:02:16PM -0800, Nicolin Chen wrote:
> On Tue, Jan 21, 2025 at 04:09:24PM -0400, Jason Gunthorpe wrote:
> > On Tue, Jan 21, 2025 at 11:55:16AM -0800, Nicolin Chen wrote:
> > > > > > IOMMU_VEVENTQ_STATE_OVERFLOW with a 0 length event is seen if events
> > > > > > have been lost and no subsequent events are present. It exists to
> > > > > > ensure timely delivery of the overflow event to userspace. counter
> > > > > > will be the sequence number of the next successful event.
> > > > > 
> > > > > So userspace should first read the header to decide whether or not
> > > > > to read a vEVENT. If header is overflows, it should skip the vEVENT
> > > > > struct and read the next header?
> > > > 
> > > > Yes, but there won't be a next header. overflow would always be the
> > > > last thing in a read() response. If there is another event then
> > > > overflow is indicated by non-monotonic count.
> > > 
> > > I am not 100% sure why "overflow would always be the last thing
> > > in a read() response". I thought that kernel should immediately
> > > report an overflow to user space when the vEVENTQ is overflowed.
> > 
> > As below, if you observe overflow then it was at the end of the kernel
> > queue and there is no further events after it. So it should always end
> > up last.
> > 
> > Perhaps we could enforce this directly in the kernel's read by making
> > it the only, first and last, response to read.
> 
> Hmm, since the overflow node is the last node in the list, isn't
> it already an enforcement it's the only/first/last node to read?
> (Assuming we choose to delete the overflow node from the list if
>  new event can be inserted.)

Since we don't hold the spinlock the whole time there is a race where
we could pull the overflow off and then another entry could be pushed
while we do the copy_to_user.

> > > Yet, thinking about this once again: user space actually has its
> > > own queue. There's probably no point in letting it know about an
> > > overflow immediately when the kernel vEVENTQ overflows until its
> > > own user queue overflows after it reads the entire vEVENTQ so it
> > > can trigger a vHW event/irq to the VM?
> > 
> > The kernel has no idea what userspace is doing, the kernel's job
> > should be to ensure timely delivery of all events, if an event is lost
> > it should ensure timely delivery of the lost event notification. There
> > is little else it can do.
> 
> Yet, "timely" means still having an entire-queue-size-long delay
> since the overflow node is at the end of the queue, right?

Yes, but also in this case the vIOMMU isn't experiancing an overflow
so it doesn't need to know about it. 

The main point here is if we somehow loose an event the vIOMMU driver
may need to do something to recover from the lost event. It doesn't
become relavent until the lost event is present in the virtual HW
queue.

There is also the minor detail of what happens if the hypervisor HW
queue overflows - I don't know the answer here. It is security
concerning since the VM can spam DMA errors at high rate. :|

Jason