Re: [PATCH RFC 0/8] basic vfio-ccw infrastructure

Dong Jia <bjsdjshi@xxxxxxxxxxxxxxxxxx> · Wed, 4 May 2016 17:26:29 +0800

On Fri, 29 Apr 2016 11:17:35 -0600
Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:

Dear Alex:

Thanks for the comments.

[...]

> > 
> > The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a
> > good example to get understand how these patches work. Here is a little
> > bit more detail how an I/O request triggered by the Qemu guest will be
> > handled (without error handling).
> > 
> > Explanation:
> > Q1-Q4: Qemu side process.
> > K1-K6: Kernel side process.
> > 
> > Q1. Intercept a ssch instruction.
> > Q2. Translate the guest ccw program to a user space ccw program
> >     (u_ccwchain).
> 
> Is this replacing guest physical address in the program with QEMU
> virtual addresses?
Yes.

> 
> > Q3. Call VFIO_DEVICE_CCW_CMD_REQUEST (u_ccwchain, orb, irb).
> >     K1. Copy from u_ccwchain to kernel (k_ccwchain).
> >     K2. Translate the user space ccw program to a kernel space ccw
> >         program, which becomes runnable for a real device.
> 
> And here we translate and likely pin QEMU virtual address to physical
> addresses to further modify the program sent into the channel?
Yes. Exactly.

> 
> >     K3. With the necessary information contained in the orb passed in
> >         by Qemu, issue the k_ccwchain to the device, and wait event q
> >         for the I/O result.
> >     K4. Interrupt handler gets the I/O result, and wakes up the wait q.
> >     K5. CMD_REQUEST ioctl gets the I/O result, and uses the result to
> >         update the user space irb.
> >     K6. Copy irb and scsw back to user space.
> > Q4. Update the irb for the guest.
> 
> If the answers to my questions above are both yes,
Yes, they are.

> then this is really a mediated interface, not a direct assignment.
Right. This is true.

> We don't need an iommu
> because we're policing and translating the program for the device
> before it gets sent to hardware.  I think there are better ways than
> noiommu to handle such devices perhaps even with better performance
> than this two-stage translation.  In fact, I think the solution we plan
> to implement for vGPU support would work here.
> 
> Like your device, a vGPU is mediated, we don't have IOMMU level
> translation or isolation since a vGPU is largely a software construct,
> but we do have software policing and translating how the GPU is
> programmed.  To do this we're creating a type1 compatible vfio iommu
> backend that uses the existing map and unmap ioctls, but rather than
> programming them into an IOMMU for a device, it simply stores the
> translations for use by later requests.  This means that a device
> programmed in a VM with guest physical addresses can have the
> vfio kernel convert that address to process virtual address, pin the
> page and program the hardware with the host physical address in one
> step.
I've read through the mail threads those discuss how to add vGPU
support in VFIO. I'm afraid that proposal could not be simply addressed
to this case, especially if we want to make the vfio api completely
compatible with the existing usage.

AFAIU, a PCI device (or a vGPU device) uses a dedicated, exclusive and
fixed range of address in the memory space for DMA operations. Any
address inside this range will not be used for other purpose. Thus we
can add memory listener on this range, and pin the pages for further
use (DMA operation). And we can keep the pages pinned during the life
cycle of the VM (not quite accurate, or I should say 'the target
device').

Well, a Subchannel Device does not have such a range of address. The
device driver simply calls kalloc() to get a piece of memory, and
assembles a ccw program with it, before issuing the ccw program to
perform an I/O operation. So the Qemu memory listener can't tell if an
address is for an I/O operation, or for whatever else. And this makes
the memory listener unnecessary for our case.

The only time point that we know we should pin pages for I/O, is the
time that an I/O instruction (e.g. ssch) was intercepted. At this
point, we know the address contented in the parameter of the ssch
instruction points to a piece of memory that contents a ccw program.
Then we do: pin the pages --> convert the ccw program --> perform the
I/O --> return the I/O result --> and unpin the pages.

> 
> This architecture also makes the vfio api completely compatible with
> existing usage without tainting QEMU with support for noiommu devices.
> I would strongly suggest following a similar approach and dropping the
> noiommu interface.  We really do not need to confuse users with noiommu
> devices that are safe and assignable and devices where noiommu should
> warn them to stay away.  Thanks,
Understand. But like explained above, even if we introduce a new vfio
iommu backend, what it does would probably look quite like what the
no-iommu backend does. Any idea about this?

> 
> Alex
> 

--------
Dong Jia

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html