On 06/05/2014 10:30 PM, Benjamin Herrenschmidt wrote: > On Thu, 2014-06-05 at 13:56 +0200, Alexander Graf wrote: >> What if we ask user space to give us a pointer to user space allocated >> memory along with the TCE registration? We would still ask user space to >> only use the returned fd for TCE modifications, but would have some >> nicely swappable memory we can store the TCE entries in. > > That isn't going to work terribly well for VFIO :-) But yes, for > emulated devices, we could improve things a bit, including for > the 32-bit TCE tables. > > For emulated, the real mode path could walk the page tables and fallback > to virtual mode & get_user if the page isn't present, thus operating > directly on qemu memory TCE tables instead of the current pinned stuff. > > However that has a cost in performance, but since that's really only > used for emulated devices and PAPR VIOs, it might not be a huge issue. > > But for VFIO we don't have much choice, we need to create something the > HW can access. You are confusing things here. There are 2 tables: 1. guest-visible TCE table, this is what is allocated for VIO or emulated PCI; 2. real HW DMA window, one exists already for DMA32 and one I will allocated for a huge window. I have just #2 for VFIO now but we will need both in order to implement H_GET_TCE correctly, and this is the table I will allocate by this new ioctl. >> In fact, the code as is today can allocate an arbitrary amount of pinned >> kernel memory from within user space without any checks. > > Right. We should at least account it in the locked limit. Yup. And (probably) this thing will keep a counter of how many windows were created per KVM instance to avoid having multiple copies of the same table. -- Alexey -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html