> sriov_init() gets called first. It gathers the resource size and resource > flags information. pci_bios_allocate_resource() uses that information to allocate resource > and in case of power, verify the allocations done by the firmware. Since the code > in pcibios_allocate_resources() does not validate the IOV BARs, the res->flags > of those IOV BARS continue to stay enabled, which later leads to conflicts. > The key here is to reset res->flags for any BAR when conflict is detected. No, you should never do that. For example, take a device with a memory BAR which has a conflict. If you reset the flags, you essentially "hide" that BAR from Linux, potentially allows Linux to enable memory (or IO) decoding on the device ... but the BAR is still there with it's conflicting value, and will potentially cause hell on your bus. You cannot enable memory or IO decoding on a device if any corresponding BAR has conflicts. You can't just "hide" the conflicting BAR. SR-IOV is to some extent similar. If the IOV BAR isn't properly allocated, it must not be hidden, but instead marked as such (res->parent == NULL is how you detect a non-allocated BAR in generic code). > > We still also need to make sure pci_enable_device() doesn't rely on that > > resource being properly allocated. > > hmm.. pci_enable_device()'s job is to enable all the resources used by the > device. SRIOV BARs; after all, is a BAR belonging to the PF, not the VF. agree? Not exactly. It's not necessary for the PFs to operate. The main question is whether it's -decoded- by the device, and that's why we should probably -not- leave the device with a numVF set to non-0 until after we know we have a properly allocated VF BAR (and if not, leave numVF to 0). > > Finally, keep in mind that the way we read that resource to begin with > > is buggy (yet another piece of code written with only x86 in mind and > > happily stuck into a generic place): > > > > We do a __pci_read_base() from the IOV code. This is broken on quite a > > few non-x86 archs such as powerpc: > > > > For us, MMIO addresses from the CPU space aren't mapped 1:1 to > > corresponding PCI addresses. They need to be translated. This is done at > > probe time from a header quirk for the "normal" BARs. The generic code > > doesn't do it for IOV. It should at the very least use > > pcibios_bus_to_resource() to convert it. > > ok. will explore this. Note also, that with the other kind of constraints I mentioned, such as the need that we will have to adjust the page size using an EEH specific algorithm on powernv (under OPAL) etc... I think the right way to do things is: - sriov_init() (probe time) reads in all the information for the device (sizes the IOV BAR, etc...), collects everything it needs in the iov structure (including the "preferred" system page size setting), but leaves the device back in a state where system page size is the original setting and numVF is 0. - The VF BAR then may or may not be "allocated" (or re-assigned etc...) - Later, either when the PF tries to enable the first VF, or at some other stage at the end of the resource survey (though the later would need a new arch hook), the device IOV functionality gets configured if (and only if) the VF BAR has been successfully assigned & allocated (ie res->parent non NULL). At that point, we can then set numVF, and apply the page size setting. HOWEVER, we shouldn't -calculate- them there, we should just apply the values from the iov structure. That way, arch code, during the platform/arch-specific resource survey, can decide to use a different page size calculation algorithm, or change the number of supported VFs etc... based on platform considerations, such as the need to apply a "special" stride or limit the number of VFs due to resource constraints (max number of PEs for EEH etc...). IE, This is the same basic scheme used for everything else on PCI, we first read things, leaving devices mostly alone, then there's a resource allocation/assignment phase, then the resources are actually used & devices enabled. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html