On Thu, Dec 10, 2009 at 07:16:04AM +0200, Muli Ben-Yehuda wrote: > On Wed, Dec 09, 2009 at 06:38:54PM +0100, Alexander Graf wrote: > > > While trying to get device passthrough working with an emulex hba, > > kvm refused to pass it through because it has a BAR of 256 bytes: > > > > Region 0: Memory at d2100000 (64-bit, non-prefetchable) [size=4K] > > Region 2: Memory at d2101000 (64-bit, non-prefetchable) [size=256] > > Region 4: I/O ports at b100 [size=256] > > > > Since the page boundary is an arbitrary optimization to allow 1:1 > > mapping of physical to virtual addresses, we can still take the old > > MMIO callback route. > > > > So let's add a second code path that allows for size & 0xFFF != 0 > > sized regions by looping it through userspace. > > That makes sense in general *but* the 4K-aligned check isn't just an > optimization, it also has a security implication. Consider the > theoretical case where has a multi-function device has BARs for two > functions on the same page (within a 4K boundary), and each function > is assigned to a different guest. With your current patch both guests > will be able to write to each other's BARs. Another case is where a > device has a bug and you must not write beyond the BAR or Bad Things > Happen. With this patch an *unprivileged* guest could exploit that bug > and make bad things happen. > > This can be fixed if the slow userspace mmio path checks that all MMIO > accesses by a guest fall within the portion of the page that is > assigned to it. This patch seems to implement range checks correctly, let me know if I am missing something. One also notes that we currently link qemu with libpci which I think requires admin cap to work. However, in the future we might extend this to also support getting device fds over a unix socket from a higher priviledged process. If or when this is done, we will have to be extra careful when passing device file descriptor to an unpriveledged qemu process if the BARs are less than full page in size: mapping such BAR will allow qemu access outside this BAR. A possible solution to this problem if/when it arises would be adding yet another sysfs file for each resource, which would allow read/write but not mmap access, and perform range checks in the kernel. > Cheers, > Muli > -- > Muli Ben-Yehuda | muli@xxxxxxxxxx | +972-4-8281080 > Manager, Virtualization and Systems Architecture > Master Inventor, IBM Research -- Haifa > Second Workshop on I/O Virtualization (WIOV '10): > http://sysrun.haifa.il.ibm.com/hrl/wiov2010/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html