Avi Kivity wrote: > On 10/02/2009 10:19 PM, Gregory Haskins wrote: >> What: xinterface is a mechanism that allows kernel modules external to >> the kvm.ko proper to interface with a running guest. It accomplishes >> this by creating an abstracted interface which does not expose any >> private details of the guest or its related KVM structures, and provides >> a mechanism to find and bind to this interface at run-time. >> > > If this is needed, it should be done as a virt_address_space to which > kvm and other modules bind, instead of as something that kvm exports and > other modules import. The virt_address_space can be identified by an fd > and passed around to kvm and other modules. IIUC, what you are proposing is something similar to generalizing the vbus::memctx object. I had considered doing something like that in the early design phase of vbus, but then decided it would be a hard-sell to the mm crowd, and difficult to generalize. What do you propose as the interface to program the object? > >> Why: There are various subsystems that would like to interact with a KVM >> guest which are ideally suited to exist outside the domain of the kvm.ko >> core logic. For instance, external pci-passthrough, virtual-bus, and >> virtio-net modules are currently under development. In order for these >> modules to successfully interact with the guest, they need, at the very >> least, various interfaces for signaling IO events, pointer translation, >> and possibly memory mapping. >> >> The signaling case is covered by the recent introduction of the >> irqfd/ioeventfd mechanisms. This patch provides a mechanism to cover the >> other cases. Note that today we only expose pointer-translation related >> functions, but more could be added at a future date as needs arise. >> >> Example usage: QEMU instantiates a guest, and an external module "foo" >> that desires the ability to interface with the guest (say via >> open("/dev/foo")). QEMU may then pass the kvmfd to foo via an >> ioctl, such as: ioctl(foofd, FOO_SET_VMID,&kvmfd). Upon receipt, the >> foo module can issue kvm_xinterface_bind(kvmfd) to acquire >> the proper context. Internally, the struct kvm* and associated >> struct module* will remain pinned at least until the foo module calls >> kvm_xinterface_put(). >> >> > > So, under my suggestion above, you'd call > sys_create_virt_address_space(), populate it, and pass the result to kvm > and to foo. This allows the use of virt_address_space without kvm and > doesn't require foo to interact with kvm. The problem I see here is that the only way I can think to implement this generally is something that looks very kvm-esque (slots-to-pages kind of translation). Is there a way you can think of that does not involve a kvm.ko originated vtable that is also not kvm centric? > >> +struct kvm_xinterface_ops { >> + unsigned long (*copy_to)(struct kvm_xinterface *intf, >> + unsigned long gpa, const void *src, >> + unsigned long len); >> + unsigned long (*copy_from)(struct kvm_xinterface *intf, void *dst, >> + unsigned long gpa, unsigned long len); >> + struct kvm_xvmap* (*vmap)(struct kvm_xinterface *intf, >> + unsigned long gpa, >> + unsigned long len); >> > > How would vmap() work with live migration? vmap represents shmem regions, and is a per-guest-instance resource. So my plan there is that the new and old guest instance would each have the vmap region instated at the same GPA location (assumption: gpas are stable across migration), and any state relevant data local to the shmem (like ring head/tail position) is conveyed in the serialized stream for the device model. > >> + >> +static inline void >> +_kvm_xinterface_release(struct kref *kref) >> +{ >> + struct kvm_xinterface *intf; >> + struct module *owner; >> + >> + intf = container_of(kref, struct kvm_xinterface, kref); >> + >> + owner = intf->owner; >> + rmb(); >> > > Why rmb? the intf->ops->release() line may invalidate the intf pointer, so we want to ensure that the read completes before the release() is called. TBH: I'm not 100% its needed, but I was being conservative. > >> + >> + intf->ops->release(intf); >> + module_put(owner); >> +} >> + >> >> + >> +/* >> + * gpa_to_hva() - translate a guest-physical to host-virtual using >> + * a per-cpu cache of the memslot. >> + * >> + * The gfn_to_memslot() call is relatively expensive, and the gpa access >> + * patterns exhibit a high degree of locality. Therefore, lets cache >> + * the last slot used on a per-cpu basis to optimize the lookup >> + * >> + * assumes slots_lock held for read >> + */ >> +static unsigned long >> +gpa_to_hva(struct _xinterface *_intf, unsigned long gpa) >> +{ >> + int cpu = get_cpu(); >> + unsigned long gfn = gpa>> PAGE_SHIFT; >> + struct kvm_memory_slot *memslot = _intf->slotcache[cpu]; >> + unsigned long addr = 0; >> + >> + if (!memslot >> + || gfn< memslot->base_gfn >> + || gfn>= memslot->base_gfn + memslot->npages) { >> + >> + memslot = gfn_to_memslot(_intf->kvm, gfn); >> + if (!memslot) >> + goto out; >> + >> + _intf->slotcache[cpu] = memslot; >> + } >> + >> + addr = _gfn_to_hva(gfn, memslot) + offset_in_page(gpa); >> + >> +out: >> + put_cpu(); >> + >> + return addr; >> +} >> > > > A simple per-vcpu cache (in struct kvm_vcpu) is likely to give better > results. per-vcpu will not work well here, unfortunately, since this is an external interface mechanism. The callers will generally be from a kthread or some other non-vcpu related context. Even if we could figure out a vcpu to use as a basis, we would require some kind of heavier-weight synchronization which would not be as desirable. Therefore, I opted to go per-cpu and use the presumably lighterweight get_cpu/put_cpu() instead. > >> +static unsigned long >> +xinterface_copy_to(struct kvm_xinterface *intf, unsigned long gpa, >> + const void *src, unsigned long n) >> +{ >> + struct _xinterface *_intf = to_intf(intf); >> + unsigned long dst; >> + bool kthread = !current->mm; >> + >> + down_read(&_intf->kvm->slots_lock); >> + >> + dst = gpa_to_hva(_intf, gpa); >> + if (!dst) >> + goto out; >> + >> + if (kthread) >> + use_mm(_intf->mm); >> + >> + if (kthread || _intf->mm == current->mm) >> + n = copy_to_user((void *)dst, src, n); >> + else >> + n = _slow_copy_to_user(_intf, dst, src, n); >> > > Can't you switch the mm temporarily instead of this? Thats actually what I do for the fast-path (use_mm() does a switch_to() internally). The slow-path is only there for completeness for when switching is not possible (such as if called with an mm already active i.e. process-context). In practice, however, this doesnt happen. Virtually 100% of the calls in vbus hit the fast-path here, and I suspect most xinterface clients would find the same conditions as well. Thanks Avi, -Greg
Attachment:
signature.asc
Description: OpenPGP digital signature