On Wed, 2016-01-27 at 09:47 +0800, Jike Song wrote: > On 01/27/2016 06:56 AM, Alex Williamson wrote: > > On Tue, 2016-01-26 at 22:39 +0000, Tian, Kevin wrote: > > > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > > > > Sent: Wednesday, January 27, 2016 6:27 AM > > > > > > > > On Tue, 2016-01-26 at 22:15 +0000, Tian, Kevin wrote: > > > > > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > > > > > > Sent: Wednesday, January 27, 2016 6:08 AM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Today KVMGT (not using VFIO yet) registers I/O emulation callbacks to > > > > > > > > > KVM, so VM MMIO access will be forwarded to KVMGT directly for > > > > > > > > > emulation in kernel. If we reuse above R/W flags, the whole emulation > > > > > > > > > path would be unnecessarily long with obvious performance impact. We > > > > > > > > > either need a new flag here to indicate in-kernel emulation (bias from > > > > > > > > > passthrough support), or just hide the region alternatively (let KVMGT > > > > > > > > > to handle I/O emulation itself like today). > > > > > > > > > > > > > > > > That sounds like a future optimization TBH. There's very strict > > > > > > > > layering between vfio and kvm. Physical device assignment could make > > > > > > > > use of it as well, avoiding a round trip through userspace when an > > > > > > > > ioread/write would do. Userspace also needs to orchestrate those kinds > > > > > > > > of accelerators, there might be cases where userspace wants to see those > > > > > > > > transactions for debugging or manipulating the device. We can't simply > > > > > > > > take shortcuts to provide such direct access. Thanks, > > > > > > > > > > > > > > > > > > > > > > But we have to balance such debugging flexibility and acceptable performance. > > > > > > > To me the latter one is more important otherwise there'd be no real usage > > > > > > > around this technique, while for debugging there are other alternative (e.g. > > > > > > > ftrace) Consider some extreme case with 100k traps/second and then see > > > > > > > how much impact a 2-3x longer emulation path can bring... > > > > > > > > > > > > Are you jumping to the conclusion that it cannot be done with proper > > > > > > layering in place? Performance is important, but it's not an excuse to > > > > > > abandon designing interfaces between independent components. Thanks, > > > > > > > > > > > > > > > > Two are not controversial. My point is to remove unnecessary long trip > > > > > as possible. After another thought, yes we can reuse existing read/write > > > > > flags: > > > > > - KVMGT will expose a private control variable whether in-kernel > > > > > delivery is required; > > > > > > > > But in-kernel delivery is never *required*. Wouldn't userspace want to > > > > deliver in-kernel any time it possibly could? > > > > > > > > > - when the variable is true, KVMGT will register in-kernel MMIO > > > > > emulation callbacks then VM MMIO request will be delivered to KVMGT > > > > > directly; > > > > > - when the variable is false, KVMGT will not register anything. > > > > > VM MMIO request will then be delivered to Qemu and then ioread/write > > > > > will be used to finally reach KVMGT emulation logic; > > > > > > > > No, that means the interface is entirely dependent on a backdoor through > > > > KVM. Why can't userspace (QEMU) do something like register an MMIO > > > > region with KVM handled via a provided file descriptor and offset, > > > > couldn't KVM then call the file ops without a kernel exit? Thanks, > > > > > > > > > > Could you elaborate this thought? If it can achieve the purpose w/o > > > a kernel exit definitely we can adapt to it. :-) > > > > I only thought of it when replying to the last email and have been doing > > some research, but we already do quite a bit of synchronization through > > file descriptors. The kvm-vfio pseudo device uses a group file > > descriptor to ensure a user has access to a group, allowing some degree > > of interaction between modules. Eventfds and irqfds already make use of > > f_ops on file descriptors to poke data. So, if KVM had information that > > an MMIO region was backed by a file descriptor for which it already has > > a reference via fdget() (and verified access rights and whatnot), then > > it ought to be a simple matter to get to f_ops->read/write knowing the > > base offset of that MMIO region. Perhaps it could even simply use > > __vfs_read/write(). Then we've got a proper reference to the file > > descriptor for ownership purposes and we've transparently jumped across > > modules without any implicit knowledge of the other end. Could it work? > > This is OK for KVMGT, from fops to vgpu device-model would always be simple. > The only question is, how is KVM hypervisor supposed to get the fd on VM-exitings? Hi Jike, Sorry, I don't understand "on VM-exiting". KVM would hold a reference to the fd via fdget(), so the vfio device wouldn't be closed until the VM exits and KVM releases that reference. > copy-and-paste the current implementation of vcpu_mmio_write(), seems > nothing but GPA and len are provided: I presume that an MMIO region is already registered with a GPA and length, the additional information necessary would be a file descriptor and offset into the file descriptor for the base of the MMIO space. > static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len, > const void *v) > { > int handled = 0; > int n; > > do { > n = min(len, 8); > if (!(vcpu->arch.apic && > !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, addr, n, v)) > && kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v)) > break; > handled += n; > addr += n; > len -= n; > v += n; > } while (len); > > return handled; > } > > If we back a GPA range with a fd, this will also be a 'backdoor'? KVM would simply be able to service the MMIO access using the provided fd and offset. It's not a back door because we will have created an API for KVM to have a file descriptor and offset registered (by userspace) to handle the access. Also, KVM does not know the file descriptor is handled by a VFIO device and VFIO doesn't know the read/write accesses is initiated by KVM. Seems like the question is whether we can fit something like that into the existing KVM MMIO bus/device handlers in-kernel. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html