On Thu, Dec 10, 2009 at 11:31:54AM +0100, Alexander Graf wrote: > > On 10.12.2009, at 11:27, Michael S. Tsirkin wrote: > > > On Thu, Dec 10, 2009 at 11:08:58AM +0100, Alexander Graf wrote: > >> > >> On 10.12.2009, at 10:52, Alexander Graf wrote: > >> > >>> > >>> On 10.12.2009, at 10:43, Michael S. Tsirkin wrote: > >>> > >>>> On Thu, Dec 10, 2009 at 07:16:04AM +0200, Muli Ben-Yehuda wrote: > >>>>> On Wed, Dec 09, 2009 at 06:38:54PM +0100, Alexander Graf wrote: > >>>>> > >>>>>> While trying to get device passthrough working with an emulex hba, > >>>>>> kvm refused to pass it through because it has a BAR of 256 bytes: > >>>>>> > >>>>>> Region 0: Memory at d2100000 (64-bit, non-prefetchable) [size=4K] > >>>>>> Region 2: Memory at d2101000 (64-bit, non-prefetchable) [size=256] > >>>>>> Region 4: I/O ports at b100 [size=256] > >>>>>> > >>>>>> Since the page boundary is an arbitrary optimization to allow 1:1 > >>>>>> mapping of physical to virtual addresses, we can still take the old > >>>>>> MMIO callback route. > >>>>>> > >>>>>> So let's add a second code path that allows for size & 0xFFF != 0 > >>>>>> sized regions by looping it through userspace. > >>>>> > >>>>> That makes sense in general *but* the 4K-aligned check isn't just an > >>>>> optimization, it also has a security implication. Consider the > >>>>> theoretical case where has a multi-function device has BARs for two > >>>>> functions on the same page (within a 4K boundary), and each function > >>>>> is assigned to a different guest. With your current patch both guests > >>>>> will be able to write to each other's BARs. Another case is where a > >>>>> device has a bug and you must not write beyond the BAR or Bad Things > >>>>> Happen. With this patch an *unprivileged* guest could exploit that bug > >>>>> and make bad things happen. > >>>>> > >>>>> This can be fixed if the slow userspace mmio path checks that all MMIO > >>>>> accesses by a guest fall within the portion of the page that is > >>>>> assigned to it. > >>>> > >>>> This patch seems to implement range checks correctly, > >>>> let me know if I am missing something. > >>>> > >>>> One also notes that we currently link qemu with libpci > >>>> which I think requires admin cap to work. > >>>> However, in the future we might extend this to > >>>> also support getting device fds over a unix socket > >>>> from a higher priviledged process. > >>>> > >>>> If or when this is done, we will have to be > >>>> extra careful when passing > >>>> device file descriptor to an unpriveledged qemu process if > >>>> the BARs are less than full page in size: mapping > >>>> such BAR will allow qemu access outside this BAR. > >>>> > >>>> A possible solution to this problem > >>>> if/when it arises would be adding yet another sysfs file > >>>> for each resource, which would allow read/write but not > >>>> mmap access, and perform range checks in the kernel. > >>> > >>> Sounds like the best solution to this problem, yeah. Though we'd only need those for non-page-boundary BARs. So I guess the best would be to always export them from the kernel, but only use them when BAR & (PAGE_SIZE-1). > >> > >> Hm, or add read/write fd functions that always do boundary checks to the existing interface and only allow mmap on size & PAGE_SIZE. Or only allow non-aligned mmap when the admin cap is present. > >> > >> Alex > > > > This might break existing applications. > > We don't want that. > > Well currently you can't mmap the resource at all without at least r/w rights on the file, right? You could have dropped the cap or got the fd from another process. > But yeah, we'd probably change behavior that could break someone - sigh. > > Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html