On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: > On 22/05/12 13:21, Alexander Graf wrote: >> >> >> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: >> >>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>> Alexander, >>>> >>>> Is that any better? :) >>> >>> Alex (Graf that is), ping ? >>> >>> The original patch from Alexey was fine btw. >>> >>> VFIO will always call things with the existing capability offset so >>> there's no real risk of doing the wrong thing or break the list or >>> anything. >>> >>> IE. A small simple patch that addresses the problem :-) >>> >>> The new patch is a bit more "robust" I believe, I don't think we need to >>> go too far to fix a problem we don't have. But we need a fix for the >>> real issue and the simple patch does it neatly from what I can >>> understand. >>> >>> Cheers, >>> Ben. >>> >>>> >>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>> * in pci config space */ >>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>> uint8_t offset, uint8_t size) >>>> { >>>> - uint8_t *config; >>>> + uint8_t *config, existing; >> >> Existing is a pointer to the target dev's config space, right? > > Yes. > >>>> int i, overlapping_cap; >>>> >>>> + existing = pci_find_capability(pdev, cap_id); >>>> + if (existing) { >>>> + if (offset && (existing != offset)) { >>>> + return -EEXIST; >>>> + } >>>> + for (i = existing; i < size; ++i) { >> >> So how does this possibly make sense? > > Although I do not expect VFIO to add capabilities (does not make sense), I still want to double > check that this space has not been tried to use by someone else. i is an int. existing is a uint8_t*. > >>>> + if (pdev->used[i]) { >>>> + return -EFAULT; >>>> + } >>>> + } >>>> + memset(pdev->used + offset, 0xFF, size); >> Why? > > Because I am marking the space this capability takes as used. But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no? > >>>> + /* Make capability read-only by default */ >>>> + memset(pdev->wmask + offset, 0, size); >> Why? > > Because the pci_add_capability() does it for a new capability by default. Hrm. So you're copying code? Can't you merge the overwrite and write cases? Alex > > >>>> + /* Check capability by default */ >>>> + memset(pdev->cmask + offset, 0xFF, size); >> >> I don't understand this part either. > > The pci_add_capability() does it for a new capability by default. > > > >> >> Alex >> >>>> + return existing; >>>> + } >>>> + >>>> if (!offset) { >>>> offset = pci_find_space(pdev, size); >>>> if (!offset) { >>>> return -ENOSPC; >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>>> >>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>>> >>>>>>> 11.05.2012 20:52, Alexander Graf написал: >>>>>>>> >>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>>> >>>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>>> is being built by QEMU. >>>>>>>>> >>>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>>> >>>>>>>>> For example, the old code destroys the following config >>>>>>>>> of PCIe Intel E1000E: >>>>>>>>> >>>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>>> 0x34: 0xC8 >>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>> >>>>>>>>> after: >>>>>>>>> 0x34: 0xD0 >>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>> >>>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>>> >>>>>>>>> The proposed patch does not change capability pointers when >>>>>>>>> the same type capability is about to add. >>>>>>>>> >>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> >>>>>>>>> --- >>>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>>> --- a/hw/pci.c >>>>>>>>> +++ b/hw/pci.c >>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>>> } >>>>>>>>> >>>>>>>>> config = pdev->config + offset; >>>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>>> >>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>>> * replace the existing one or >>>>>>>> * drop out and not write the new one in. >>>>>> >>>>>> * hw_error :) >>>>>> >>>>>>>> >>>>>>>> I'm not sure which way would be more natural. >>>>>>> >>>>>>> There is a third option - add another function, lets call it >>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>>> but won't touch list pointers. >>>>>> >>>>>> What good is a function that breaks internal consistency? >>>>> >>>>> >>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>>> capability at some fixed offset would have initialized their config space before calling this >>>>> capabilities API (as VFIO does). >>>>> >>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>>> driver may care about its offset. >>>>> >>>>> >>>>> >>>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>>> exactly that the capability exists and where it is and it calls >>>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>>> just for imaginery scalability is a bit weird, no? >>>>>> >>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>>>> >>>>>> >>>>>> Alex >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> > > > -- > Alexey -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html