On Tue, Jan 21, 2025 at 05:14:04PM -0400, Jason Gunthorpe wrote: > On Tue, Jan 21, 2025 at 01:02:16PM -0800, Nicolin Chen wrote: > > On Tue, Jan 21, 2025 at 04:09:24PM -0400, Jason Gunthorpe wrote: > > > On Tue, Jan 21, 2025 at 11:55:16AM -0800, Nicolin Chen wrote: > > > > > > > IOMMU_VEVENTQ_STATE_OVERFLOW with a 0 length event is seen if events > > > > > > > have been lost and no subsequent events are present. It exists to > > > > > > > ensure timely delivery of the overflow event to userspace. counter > > > > > > > will be the sequence number of the next successful event. > > > > > > > > > > > > So userspace should first read the header to decide whether or not > > > > > > to read a vEVENT. If header is overflows, it should skip the vEVENT > > > > > > struct and read the next header? > > > > > > > > > > Yes, but there won't be a next header. overflow would always be the > > > > > last thing in a read() response. If there is another event then > > > > > overflow is indicated by non-monotonic count. > > > > > > > > I am not 100% sure why "overflow would always be the last thing > > > > in a read() response". I thought that kernel should immediately > > > > report an overflow to user space when the vEVENTQ is overflowed. > > > > > > As below, if you observe overflow then it was at the end of the kernel > > > queue and there is no further events after it. So it should always end > > > up last. > > > > > > Perhaps we could enforce this directly in the kernel's read by making > > > it the only, first and last, response to read. > > > > Hmm, since the overflow node is the last node in the list, isn't > > it already an enforcement it's the only/first/last node to read? > > (Assuming we choose to delete the overflow node from the list if > > new event can be inserted.) > > Since we don't hold the spinlock the whole time there is a race where > we could pull the overflow off and then another entry could be pushed > while we do the copy_to_user. I see. I'll be careful around that. I can imagine that one tricky thing can be to restore the overflow node back to the list when a copy_to_user fails.. > > > > Yet, thinking about this once again: user space actually has its > > > > own queue. There's probably no point in letting it know about an > > > > overflow immediately when the kernel vEVENTQ overflows until its > > > > own user queue overflows after it reads the entire vEVENTQ so it > > > > can trigger a vHW event/irq to the VM? > > > > > > The kernel has no idea what userspace is doing, the kernel's job > > > should be to ensure timely delivery of all events, if an event is lost > > > it should ensure timely delivery of the lost event notification. There > > > is little else it can do. > > > > Yet, "timely" means still having an entire-queue-size-long delay > > since the overflow node is at the end of the queue, right? > > Yes, but also in this case the vIOMMU isn't experiancing an overflow > so it doesn't need to know about it. > > The main point here is if we somehow loose an event the vIOMMU driver > may need to do something to recover from the lost event. It doesn't > become relavent until the lost event is present in the virtual HW > queue. Ack. > There is also the minor detail of what happens if the hypervisor HW > queue overflows - I don't know the answer here. It is security > concerning since the VM can spam DMA errors at high rate. :| In my view, the hypervisor queue is the vHW queue for the VM, so it should act like a HW, which means it's up to the guest kernel driver that handles the high rate DMA errors.. The entire thing is complicated since we have a double buffer: a kernel queue and a hypervisor queue. I am not very sure if both queues should have the same depth or perhaps the kernel one might have a slightly bigger size to buffer the hypervisor one. If only kernel could directly add events to the hypervisor queue and also set the overflow flag in the hypervisor.. Thanks Nicolin