On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: > On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >> Alexander, >> >> Is that any better? :) > > Alex (Graf that is), ping ? > > The original patch from Alexey was fine btw. > > VFIO will always call things with the existing capability offset so > there's no real risk of doing the wrong thing or break the list or > anything. > > IE. A small simple patch that addresses the problem :-) > > The new patch is a bit more "robust" I believe, I don't think we need to > go too far to fix a problem we don't have. But we need a fix for the > real issue and the simple patch does it neatly from what I can > understand. > > Cheers, > Ben. > >> >> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >> * in pci config space */ >> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> uint8_t offset, uint8_t size) >> { >> - uint8_t *config; >> + uint8_t *config, existing; Existing is a pointer to the target dev's config space, right? >> int i, overlapping_cap; >> >> + existing = pci_find_capability(pdev, cap_id); >> + if (existing) { >> + if (offset && (existing != offset)) { >> + return -EEXIST; >> + } >> + for (i = existing; i < size; ++i) { So how does this possibly make sense? >> + if (pdev->used[i]) { >> + return -EFAULT; >> + } >> + } >> + memset(pdev->used + offset, 0xFF, size); Why? >> + /* Make capability read-only by default */ >> + memset(pdev->wmask + offset, 0, size); Why? >> + /* Check capability by default */ >> + memset(pdev->cmask + offset, 0xFF, size); I don't understand this part either. Alex >> + return existing; >> + } >> + >> if (!offset) { >> offset = pci_find_space(pdev, size); >> if (!offset) { >> return -ENOSPC; >> >> >> >> >> >> >> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>> On 12/05/12 00:13, Alexander Graf wrote: >>>> >>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>> >>>>> 11.05.2012 20:52, Alexander Graf написал: >>>>>> >>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>> >>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>> is being built by QEMU. >>>>>>> >>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>> >>>>>>> For example, the old code destroys the following config >>>>>>> of PCIe Intel E1000E: >>>>>>> >>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>> 0x34: 0xC8 >>>>>>> 0xC8: 0x01 0xD0 >>>>>>> 0xD0: 0x05 0xE0 >>>>>>> 0xE0: 0x10 0x00 >>>>>>> >>>>>>> after: >>>>>>> 0x34: 0xD0 >>>>>>> 0xC8: 0x01 0xD0 >>>>>>> 0xD0: 0x05 0xC8 >>>>>>> 0xE0: 0x10 0x00 >>>>>>> >>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>> >>>>>>> The proposed patch does not change capability pointers when >>>>>>> the same type capability is about to add. >>>>>>> >>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> >>>>>>> --- >>>>>>> hw/pci.c | 10 ++++++---- >>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>> --- a/hw/pci.c >>>>>>> +++ b/hw/pci.c >>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>> } >>>>>>> >>>>>>> config = pdev->config + offset; >>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>> >>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>> * replace the existing one or >>>>>> * drop out and not write the new one in. >>>> >>>> * hw_error :) >>>> >>>>>> >>>>>> I'm not sure which way would be more natural. >>>>> >>>>> There is a third option - add another function, lets call it >>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>> but won't touch list pointers. >>>> >>>> What good is a function that breaks internal consistency? >>> >>> >>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>> capability at some fixed offset would have initialized their config space before calling this >>> capabilities API (as VFIO does). >>> >>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>> driver may care about its offset. >>> >>> >>> >>>>> When vfio, pci_add_capability() is called from the code which knows >>>>> exactly that the capability exists and where it is and it calls >>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>> just for imaginery scalability is a bit weird, no? >>>> >>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>> >>>> >>>> Alex >>>> >>> >>> >> >> > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html