Re: [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22/05/12 15:52, Alexander Graf wrote:
> 
> 
> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
> 
>> On 22/05/12 13:21, Alexander Graf wrote:
>>>
>>>
>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote:
>>>>> Alexander,
>>>>>
>>>>> Is that any better? :)
>>>>
>>>> Alex (Graf that is), ping ?
>>>>
>>>> The original patch from Alexey was fine btw.
>>>>
>>>> VFIO will always call things with the existing capability offset so
>>>> there's no real risk of doing the wrong thing or break the list or
>>>> anything.
>>>>
>>>> IE. A small simple patch that addresses the problem :-)
>>>>
>>>> The new patch is a bit more "robust" I believe, I don't think we need to
>>>> go too far to fix a problem we don't have. But we need a fix for the
>>>> real issue and the simple patch does it neatly from what I can
>>>> understand.
>>>>
>>>> Cheers,
>>>> Ben.
>>>>
>>>>>
>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>>> * in pci config space */
>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>                       uint8_t offset, uint8_t size)
>>>>> {
>>>>> -    uint8_t *config;
>>>>> +    uint8_t *config, existing;
>>>
>>> Existing is a pointer to the target dev's config space, right?
>>
>> Yes.
>>
>>>>>    int i, overlapping_cap;
>>>>>
>>>>> +    existing = pci_find_capability(pdev, cap_id);
>>>>> +    if (existing) {
>>>>> +        if (offset && (existing != offset)) {
>>>>> +            return -EEXIST;
>>>>> +        }
>>>>> +        for (i = existing; i < size; ++i) {
>>>
>>> So how does this possibly make sense?
>>
>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double
>> check that this space has not been tried to use by someone else.
> 
> i is an int. existing is a uint8_t*.


It was there before me. This function already does a loop and this is how it was coded at the first place.


>>>>> +            if (pdev->used[i]) {
>>>>> +                return -EFAULT;
>>>>> +            }
>>>>> +        }
>>>>> +        memset(pdev->used + offset, 0xFF, size);
>>> Why?
>>
>> Because I am marking the space this capability takes as used.
> 
> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no?


No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability().

Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities)  OR  hack msi_init/msix_init not to touch capabilities if they exist.



>>>>> +        /* Make capability read-only by default */
>>>>> +        memset(pdev->wmask + offset, 0, size);
>>> Why?
>>
>> Because the pci_add_capability() does it for a new capability by default.
> 
> Hrm. So you're copying code? Can't you merge the overwrite and write cases?

I am trying to make it as a single chunk which is as small as possible.


If it helps, below is the same patch with extended context to see what is going on in that function.






hw/pci.c |   20 +++++++++++++++++++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 63a8219..7008a42 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom)
     ptr = memory_region_get_ram_ptr(&pdev->rom);
     load_image(path, ptr);
     g_free(path);
 
     if (is_default_rom) {
         /* Only the default rom images will be patched (if needed). */
         pci_patch_ids(pdev, ptr, size);
     }
 
     qemu_put_ram_ptr(ptr);
 
     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
 
     return 0;
 }
 
 static void pci_del_option_rom(PCIDevice *pdev)
 {
     if (!pdev->has_rom)
         return;
 
     vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
     memory_region_destroy(&pdev->rom);
     pdev->has_rom = false;
 }
 
 /*
  * if !offset
  * Reserve space and add capability to the linked list in pci config space
  *
  * if offset = 0,
  * Find and reserve space and add capability to the linked list
  * in pci config space */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
                        uint8_t offset, uint8_t size)
 {
-    uint8_t *config;
+    uint8_t *config, existing;
     int i, overlapping_cap;
 
+    existing = pci_find_capability(pdev, cap_id);
+    if (existing) {
+        if (offset && (existing != offset)) {
+            return -EEXIST;
+        }
+        for (i = existing; i < size; ++i) {
+            if (pdev->used[i]) {
+                return -EFAULT;
+            }
+        }
+        memset(pdev->used + offset, 0xFF, size);
+        /* Make capability read-only by default */
+        memset(pdev->wmask + offset, 0, size);
+        /* Check capability by default */
+        memset(pdev->cmask + offset, 0xFF, size);
+        return existing;
+    }
+
     if (!offset) {
         offset = pci_find_space(pdev, size);
         if (!offset) {
             return -ENOSPC;
         }
     } else {
         /* Verify that capabilities don't overlap.  Note: device assignment
          * depends on this check to verify that the device is not broken.
          * Should never trigger for emulated devices, but it's helpful
          * for debugging these. */
         for (i = offset; i < offset + size; i++) {
             overlapping_cap = pci_find_capability_at_offset(pdev, i);
             if (overlapping_cap) {
                 fprintf(stderr, "ERROR: %04x:%02x:%02x.%x "
                         "Attempt to add PCI capability %x at offset "
                         "%x overlaps existing capability %x at offset %x\n",
                         pci_find_domain(pdev->bus), pci_bus_num(pdev->bus),
                         PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
                         cap_id, offset, overlapping_cap, i);
                 return -EINVAL;
             }
         }
     }
 
     config = pdev->config + offset;
     config[PCI_CAP_LIST_ID] = cap_id;
     config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
     pdev->config[PCI_CAPABILITY_LIST] = offset;
     pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
     memset(pdev->used + offset, 0xFF, size);
     /* Make capability read-only by default */
     memset(pdev->wmask + offset, 0, size);
     /* Check capability by default */
     memset(pdev->cmask + offset, 0xFF, size);
     return offset;
 }




>>>>> +        /* Check capability by default */
>>>>> +        memset(pdev->cmask + offset, 0xFF, size);
>>>
>>> I don't understand this part either.
>>
>> The pci_add_capability() does it for a new capability by default.
>>
>>
>>
>>>
>>> Alex
>>>
>>>>> +        return existing;
>>>>> +    }
>>>>> +
>>>>>    if (!offset) {
>>>>>        offset = pci_find_space(pdev, size);
>>>>>        if (!offset) {
>>>>>            return -ENOSPC;
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote:
>>>>>> On 12/05/12 00:13, Alexander Graf wrote:
>>>>>>>
>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote:
>>>>>>>
>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;:
>>>>>>>>>
>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote:
>>>>>>>>>
>>>>>>>>>> Normally the pci_add_capability is called on devices to add new
>>>>>>>>>> capability. This is ok for emulated devices which capabilities list
>>>>>>>>>> is being built by QEMU.
>>>>>>>>>>
>>>>>>>>>> In the case of VFIO the capability may already exist and adding new
>>>>>>>>>> capability into the beginning of the linked list may create a loop.
>>>>>>>>>>
>>>>>>>>>> For example, the old code destroys the following config
>>>>>>>>>> of PCIe Intel E1000E:
>>>>>>>>>>
>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05):
>>>>>>>>>> 0x34: 0xC8
>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>> 0xD0: 0x05 0xE0
>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>
>>>>>>>>>> after:
>>>>>>>>>> 0x34: 0xD0
>>>>>>>>>> 0xC8: 0x01 0xD0
>>>>>>>>>> 0xD0: 0x05 0xC8
>>>>>>>>>> 0xE0: 0x10 0x00
>>>>>>>>>>
>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other.
>>>>>>>>>>
>>>>>>>>>> The proposed patch does not change capability pointers when
>>>>>>>>>> the same type capability is about to add.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
>>>>>>>>>> ---
>>>>>>>>>> hw/pci.c |   10 ++++++----
>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c
>>>>>>>>>> index aa0c0b8..1f7c924 100644
>>>>>>>>>> --- a/hw/pci.c
>>>>>>>>>> +++ b/hw/pci.c
>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>>>>>>>>>  }
>>>>>>>>>>
>>>>>>>>>>  config = pdev->config + offset;
>>>>>>>>>> -    config[PCI_CAP_LIST_ID] = cap_id;
>>>>>>>>>> -    config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
>>>>>>>>>> -    pdev->config[PCI_CAPABILITY_LIST] = offset;
>>>>>>>>>> -    pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
>>>>>>>>>> +    if (config[PCI_CAP_LIST_ID] != cap_id) {
>>>>>>>>>
>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either
>>>>>>>>> * replace the existing one or
>>>>>>>>> * drop out and not write the new one in.
>>>>>>>
>>>>>>> * hw_error :)
>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm not sure which way would be more natural.
>>>>>>>>
>>>>>>>> There is a third option - add another function, lets call it
>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does
>>>>>>>> but won't touch list pointers.
>>>>>>>
>>>>>>> What good is a function that breaks internal consistency?
>>>>>>
>>>>>>
>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through
>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a
>>>>>> capability at some fixed offset would have initialized their config space before calling this
>>>>>> capabilities API (as VFIO does).
>>>>>>
>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and
>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency
>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest
>>>>>> driver may care about its offset.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> When vfio, pci_add_capability() is called from the code which knows
>>>>>>>> exactly that the capability exists and where it is and it calls
>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops
>>>>>>>> just for imaginery scalability is a bit weird, no?
>>>>>>>
>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability.
>>>>>>>
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>> -- 
>> Alexey


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux