Hi, On 4/3/20 8:08 PM, André Przywara wrote: > On 03/04/2020 19:14, Alexandru Elisei wrote: >> Hi, >> >> On 4/3/20 12:57 PM, André Przywara wrote: >>>> + } >>>> + >>>> + assert(has_bar_regions); >>> Is assert() here really a good idea? I see that it makes sense for our >>> emulated devices, but is that a valid check for VFIO? >>> From briefly looking I can't find a requirement for having at least one >>> valid BAR in general, and even if - I think we should rather return an >>> error than aborting the guest here - or ignore it altogether. >> The assert here is to discover coding errors with devices, not with the PCI >> emulation. Calling pci__register_bar_regions and providing callbacks for when BAR >> access is toggled, but *without* any valid BARs looks like a coding error in the >> device emulation code to me. > As I said, I totally see the point for our emulated devices, but it > looks like we use this code also for VFIO? Where we are not in control > of what the device exposes. > >> As for VFIO, I'm struggling to find a valid reason for someone to build a device >> that uses PCI, but doesn't have any BARs. Isn't that the entire point of PCI? I'm >> perfectly happy to remove the assert if you can provide an rationale for building >> such a device. > IIRC you have an AMD box, check the "Northbridge" PCI device there, > devices 0:18.x. They provide chipset registers via (extended) config > space only, they don't have any valid BARs. Also I found some SMBus > device without BARs. > Not the most prominent use case (especially for pass through), but > apparently valid. > I think the rationale for using this was to use a well established, > supported and discoverable interface. I see you point. I'll remove the assert. > >>>> + >>>> + return 0; >>>> +} >>>> + >>>> int pci__init(struct kvm *kvm) >>>> { >>>> int r; >>>> diff --git a/vfio/pci.c b/vfio/pci.c >>>> index 8b2a0c8dbac3..18e22a8c5320 100644 >>>> --- a/vfio/pci.c >>>> +++ b/vfio/pci.c >>>> @@ -8,6 +8,8 @@ >>>> #include <sys/resource.h> >>>> #include <sys/time.h> >>>> >>>> +#include <assert.h> >>>> + >>>> /* Wrapper around UAPI vfio_irq_set */ >>>> union vfio_irq_eventfd { >>>> struct vfio_irq_set irq; >>>> @@ -446,6 +448,81 @@ out_unlock: >>>> mutex_unlock(&pdev->msi.mutex); >>>> } >>>> >>>> +static int vfio_pci_bar_activate(struct kvm *kvm, >>>> + struct pci_device_header *pci_hdr, >>>> + int bar_num, void *data) >>>> +{ >>>> + struct vfio_device *vdev = data; >>>> + struct vfio_pci_device *pdev = &vdev->pci; >>>> + struct vfio_pci_msix_pba *pba = &pdev->msix_pba; >>>> + struct vfio_pci_msix_table *table = &pdev->msix_table; >>>> + struct vfio_region *region; >>>> + bool has_msix; >>>> + int ret; >>>> + >>>> + assert((u32)bar_num < vdev->info.num_regions); >>>> + >>>> + region = &vdev->regions[bar_num]; >>>> + has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX; >>>> + >>>> + if (has_msix && (u32)bar_num == table->bar) { >>>> + ret = kvm__register_mmio(kvm, table->guest_phys_addr, >>>> + table->size, false, >>>> + vfio_pci_msix_table_access, pdev); >>>> + if (ret < 0 || table->bar != pba->bar) >>> I think this second expression deserves some comment. >>> If I understand correctly, this would register the PBA trap handler >>> separetely below if both the MSIX table and the PBA share a BAR? >> The MSIX table and the PBA structure can share the same BAR for the base address >> (that's why the MSIX capability has an offset field for both of them), but we >> register different regions for mmio emulation because we don't want to have a >> generic handler and always check if the mmio access was to the MSIX table of the >> PBA structure. I can add a comment stating that, sure. > Yes, thanks for the explanation! > >>>> + goto out; >>> Is there any particular reason you are using goto here? I find it more >>> confusing if the "out:" label has just a return statement, without any >>> cleanup or lock dropping. Just a "return ret;" here would be much >>> cleaner I think. Same for other occassions in this function and >>> elsewhere in this patch. >>> >>> Or do you plan on adding some code here later? I don't see it in this >>> series though. >> The reason I'm doing this is because I prefer one exit point from the function, >> instead of return statements at arbitrary points in the function body. As a point >> of reference, the pattern is recommended in the MISRA C standard for safety, in >> section 17.4 "No more than one return statement", and is also used in the Linux >> kernel. I think it comes down to personal preference, so unless Will of Julien >> have a strong preference against it, I would rather keep it. > Fair enough, your decision. Just to point out that I can't find this > practice in the kernel, also: > > Documentation/process/coding-style.rst, section "7) Centralized exiting > of functions": > "The goto statement comes in handy when a function exits from multiple > locations and some common work such as cleanup has to be done. If there > is no cleanup needed then just return directly." > > > Thanks! > Andre. > >>>> + } >>>> + >>>> + if (has_msix && (u32)bar_num == pba->bar) { >>>> + ret = kvm__register_mmio(kvm, pba->guest_phys_addr, >>>> + pba->size, false, >>>> + vfio_pci_msix_pba_access, pdev); >>>> + goto out; >>>> + } >>>> + >>>> + ret = vfio_map_region(kvm, vdev, region); >>>> +out: >>>> + return ret; >>>> +} >>>> + >>>> +static int vfio_pci_bar_deactivate(struct kvm *kvm, >>>> + struct pci_device_header *pci_hdr, >>>> + int bar_num, void *data) >>>> +{ >>>> + struct vfio_device *vdev = data; >>>> + struct vfio_pci_device *pdev = &vdev->pci; >>>> + struct vfio_pci_msix_pba *pba = &pdev->msix_pba; >>>> + struct vfio_pci_msix_table *table = &pdev->msix_table; >>>> + struct vfio_region *region; >>>> + bool has_msix, success; >>>> + int ret; >>>> + >>>> + assert((u32)bar_num < vdev->info.num_regions); >>>> + >>>> + region = &vdev->regions[bar_num]; >>>> + has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX; >>>> + >>>> + if (has_msix && (u32)bar_num == table->bar) { >>>> + success = kvm__deregister_mmio(kvm, table->guest_phys_addr); >>>> + /* kvm__deregister_mmio fails when the region is not found. */ >>>> + ret = (success ? 0 : -ENOENT); >>>> + if (ret < 0 || table->bar!= pba->bar) >>>> + goto out; >>>> + } >>>> + >>>> + if (has_msix && (u32)bar_num == pba->bar) { >>>> + success = kvm__deregister_mmio(kvm, pba->guest_phys_addr); >>>> + ret = (success ? 0 : -ENOENT); >>>> + goto out; >>>> + } >>>> + >>>> + vfio_unmap_region(kvm, region); >>>> + ret = 0; >>>> + >>>> +out: >>>> + return ret; >>>> +} >>>> + >>>> static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr, >>>> u8 offset, void *data, int sz) >>>> { >>>> @@ -805,12 +882,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev) >>>> ret = -ENOMEM; >>>> goto out_free; >>>> } >>>> - pba->guest_phys_addr = table->guest_phys_addr + table->size; >>>> - >>>> - ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size, >>>> - false, vfio_pci_msix_table_access, pdev); >>>> - if (ret < 0) >>>> - goto out_free; >>>> >>>> /* >>>> * We could map the physical PBA directly into the guest, but it's >>>> @@ -820,10 +891,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev) >>>> * between MSI-X table and PBA. For the sake of isolation, create a >>>> * virtual PBA. >>>> */ >>>> - ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false, >>>> - vfio_pci_msix_pba_access, pdev); >>>> - if (ret < 0) >>>> - goto out_free; >>>> + pba->guest_phys_addr = table->guest_phys_addr + table->size; >>>> >>>> pdev->msix.entries = entries; >>>> pdev->msix.nr_entries = nr_entries; >>>> @@ -894,11 +962,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev, >>>> region->guest_phys_addr = pci_get_mmio_block(map_size); >>>> } >>>> >>>> - /* Map the BARs into the guest or setup a trap region. */ >>>> - ret = vfio_map_region(kvm, vdev, region); >>>> - if (ret) >>>> - return ret; >>>> - >>>> return 0; >>>> } >>>> >>>> @@ -945,7 +1008,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm, >>>> } >>>> >>>> /* We've configured the BARs, fake up a Configuration Space */ >>>> - return vfio_pci_fixup_cfg_space(vdev); >>>> + ret = vfio_pci_fixup_cfg_space(vdev); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate, >>>> + vfio_pci_bar_deactivate, vdev); >>>> } >>>> >>>> /* >>>> diff --git a/virtio/pci.c b/virtio/pci.c >>>> index d111dc499f5e..598da699c241 100644 >>>> --- a/virtio/pci.c >>>> +++ b/virtio/pci.c >>>> @@ -11,6 +11,7 @@ >>>> #include <sys/ioctl.h> >>>> #include <linux/virtio_pci.h> >>>> #include <linux/byteorder.h> >>>> +#include <assert.h> >>>> #include <string.h> >>>> >>>> static u16 virtio_pci__port_addr(struct virtio_pci *vpci) >>>> @@ -462,6 +463,64 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu, >>>> virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len); >>>> } >>>> >>>> +static int virtio_pci__bar_activate(struct kvm *kvm, >>>> + struct pci_device_header *pci_hdr, >>>> + int bar_num, void *data) >>>> +{ >>>> + struct virtio_device *vdev = data; >>>> + u32 bar_addr, bar_size; >>>> + int r = -EINVAL; >>>> + >>>> + assert(bar_num <= 2); >>>> + >>>> + bar_addr = pci__bar_address(pci_hdr, bar_num); >>>> + bar_size = pci__bar_size(pci_hdr, bar_num); >>>> + >>>> + switch (bar_num) { >>>> + case 0: >>>> + r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops, >>>> + bar_size, vdev); >>>> + if (r > 0) >>>> + r = 0; >>>> + break; >>>> + case 1: >>>> + r = kvm__register_mmio(kvm, bar_addr, bar_size, false, >>>> + virtio_pci__io_mmio_callback, vdev); >>>> + break; >>>> + case 2: >>>> + r = kvm__register_mmio(kvm, bar_addr, bar_size, false, >>>> + virtio_pci__msix_mmio_callback, vdev); >>> I think adding a break; here looks nicer. >> Sure, it will make the function look more consistent. >> >> Thanks, >> Alex >>> Cheers, >>> Andre >>> >>> >>>> + } >>>> + >>>> + return r; >>>> +} >>>> + >>>> +static int virtio_pci__bar_deactivate(struct kvm *kvm, >>>> + struct pci_device_header *pci_hdr, >>>> + int bar_num, void *data) >>>> +{ >>>> + u32 bar_addr; >>>> + bool success; >>>> + int r = -EINVAL; >>>> + >>>> + assert(bar_num <= 2); >>>> + >>>> + bar_addr = pci__bar_address(pci_hdr, bar_num); >>>> + >>>> + switch (bar_num) { >>>> + case 0: >>>> + r = ioport__unregister(kvm, bar_addr); >>>> + break; >>>> + case 1: >>>> + case 2: >>>> + success = kvm__deregister_mmio(kvm, bar_addr); >>>> + /* kvm__deregister_mmio fails when the region is not found. */ >>>> + r = (success ? 0 : -ENOENT); >>>> + } >>>> + >>>> + return r; >>>> +} >>>> + >>>> int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, >>>> int device_id, int subsys_id, int class) >>>> { >>>> @@ -476,23 +535,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, >>>> BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE)); >>>> >>>> port_addr = pci_get_io_port_block(PCI_IO_SIZE); >>>> - r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE, >>>> - vdev); >>>> - if (r < 0) >>>> - return r; >>>> - port_addr = (u16)r; >>>> - >>>> mmio_addr = pci_get_mmio_block(PCI_IO_SIZE); >>>> - r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false, >>>> - virtio_pci__io_mmio_callback, vdev); >>>> - if (r < 0) >>>> - goto free_ioport; >>>> - >>>> msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2); >>>> - r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false, >>>> - virtio_pci__msix_mmio_callback, vdev); >>>> - if (r < 0) >>>> - goto free_mmio; >>>> >>>> vpci->pci_hdr = (struct pci_device_header) { >>>> .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), >>>> @@ -518,6 +562,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, >>>> .bar_size[2] = cpu_to_le32(PCI_IO_SIZE*2), >>>> }; >>>> >>>> + r = pci__register_bar_regions(kvm, &vpci->pci_hdr, >>>> + virtio_pci__bar_activate, >>>> + virtio_pci__bar_deactivate, vdev); >>>> + if (r < 0) >>>> + return r; >>>> + >>>> vpci->dev_hdr = (struct device_header) { >>>> .bus_type = DEVICE_BUS_PCI, >>>> .data = &vpci->pci_hdr, >>>> @@ -548,20 +598,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, >>>> >>>> r = device__register(&vpci->dev_hdr); >>>> if (r < 0) >>>> - goto free_msix_mmio; >>>> + return r; >>>> >>>> /* save the IRQ that device__register() has allocated */ >>>> vpci->legacy_irq_line = vpci->pci_hdr.irq_line; >>>> >>>> return 0; >>>> - >>>> -free_msix_mmio: >>>> - kvm__deregister_mmio(kvm, msix_io_block); >>>> -free_mmio: >>>> - kvm__deregister_mmio(kvm, mmio_addr); >>>> -free_ioport: >>>> - ioport__unregister(kvm, port_addr); >>>> - return r; >>>> } >>>> >>>> int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev) >>>>