> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Tuesday, August 18, 2020 7:50 PM > > On Tue, Aug 18, 2020 at 01:09:01AM +0000, Tian, Kevin wrote: > > The difference in my reply is not just about the implementation gap > > of growing a userspace DMA framework to a passthrough framework. > > My real point is about the different goals that each wants to achieve. > > Userspace DMA is purely about allowing userspace to directly access > > the portal and do DMA, but the wq configuration is always under kernel > > driver's control. On the other hand, passthrough means delegating full > > control of the wq to the guest and then associated mock-up (live migration, > > vSVA, posted interrupt, etc.) for that to work. I really didn't see the > > value of mixing them together when there is already a good candidate > > to handle passthrough... > > In Linux a 'VM' and virtualization has always been a normal system > process that uses a few extra kernel features. This has been more or > less the cornerstone of that design since the start. > > In that view it doesn't make any sense to say that uAPI from idxd that > is useful for virtualization somehow doesn't belong as part of the > standard uAPI. The point is that we already have a more standard uAPI (VFIO) which is unified and vendor-agnostic to userspace. Creating a idxd specific uAPI to absorb similar requirements that VFIO already does is not compelling and instead causes more trouble to Qemu or other VMMs as they need to deal with every such driver uAPI even when Qemu itself has no interest in the device detail (since the real user is inside guest). > > Especially when it is such a small detail like what APIs are used to > configure the wq. > > For instance, what about suspend/resume of containers using idxd? > Wouldn't you want to have the same basic approach of controlling the > wq from userspace that virtualization uses? > I'm not familiar with how container suspend/resume is done today. But my gut-feeling is that it's different from virtualization. For virtualization, the whole wq is assigned to the guest thus the uAPI must provide a way to save the wq state including its configuration at suspsend, and then restore the state to what guest expects when resume. However in container case which does userspace DMA, the wq is managed by host kernel and could be shared between multiple containers. So the wq state is irrelevant to container. The only relevant state is the in-fly workloads which needs a draining interface. In this view I think the two have a major difference. Thanks Kevin