> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Thursday, January 6, 2022 11:42 PM > > On Thu, Jan 06, 2022 at 06:32:57AM +0000, Tian, Kevin wrote: > > > Putting PRI aside the time to drain in-fly requests is undefined. It depends > > on how many pending requests to be waited for before completing the > > draining command on the device. This is IP specific (e.g. whether supports > > preemption) and also guest specific (e.g. whether it's actively submitting > > workload). > > You are assuming a model where NDMA has to be implemented by pushing a > command, but I would say that is very poor IP design. I was not assuming a single model. I just wanted to figure out how this model can be supported in this design, given I saw many examples of it. > > A device is fully in self-control of its own DMA and it should simply > stop it quickly when doing NDMA. simple on some classes, but definitely not so simple on others. > > Devices that are poorly designed here will have very long migration > downtime latencies and people simply won't want to use them. Different usages have different latency requirement. Do we just want people to decide whether to manage state for a device by measurement? There is always difference between an experimental environment and final production environment. A timeout mechanism is more robust as the last resort than breaking SLA in case of any surprise in the production environment. > > > > > Whether the said DOS is a real concern and how severe it is are usage > > > > specific things. Why would we want to hardcode such restriction on > > > > an uAPI? Just give the choice to the admin (as long as this restriction is > > > > clearly communicated to userspace clearly)... > > > > > > IMHO it is not just DOS, PRI can become dependent on IO which requires > > > DMA to complete. > > > > > > You could quickly get yourself into a deadlock situation where the > > > hypervisor has disabled DMA activities of other devices and the vPRI > > > simply cannot be completed. > > > > How is it related to PRI which is only about address translation? > > In something like SVA PRI can request a page which is not present and > the OS has to do DMA to load the page back from storage to make it > present and respond to the translation request. > > The DMA is not related to the device doing the PRI in the first place, > but if the hypervisor has blocked the DMA already for some other > reason (perhaps that device is also doing PRI) then it all will > deadlock. yes, but with timeout the NDMA path doesn't care about whether a PRI is not responded (due to hostile VM or such block-dma case). It simply fails the state transition request when timeout is triggered. Thanks Kevin