Hi Ben, On 07.10.2011 [08:48:20 +0200], Benjamin Herrenschmidt wrote: > > > Are you saying that Power is intrinsically unable to support SR-IOV? > > The 'pseries' platform on power is, at least with the current > hypervisor, as the HV doesn't allocate resources for the IOV BAR and the > kernel is not allowed to mess around with PCI resources allocation. > > In the long run, they -may- fix that, but currently, those machines must > not enable SR-IOV or bad things happen (sriov_init enables VFs which > causes the PF driver to fail as well). > > > Or is it merely that we can't use SR-IOV on this particular > > machine/configuration because the BAR assignments are fixed and there > > isn't enough space for the VF BARs? > > > > I would expect the latter, and therefore it seems wrong to disable it > > across the board. Perhaps on another machine with different PCI > > resource assignments, there *would* be space for the VF BARs and > > SR-IOV could be used. > > No. We can never modify the resource allocation and we know that on all > current pseries platforms, the HV will not allocate the VF BAR. > > The patch is not ideal though. Ideally, what you'd need to do is: To be clear here, "the patch" you refer to is the one in this thread? Or the other one that skips VF BARs in the pci_enable path? > - Detect that the VF BAR hasn't been assigned > > - Combine that with the knowledge that you cannot re-assign resources > on the platform (this is a flag in asm-generic/pci-flags.h but that's > not available on all archs unfortunately). > > And based on that, don't try to mess around with VFs etc... in > sriov_init(). > > The problem is that damn flag. Those PCI flags are currently used by > several archs but exclusively from within the arch code and aren't > defined on x86 for example. > > Maybe we could wrap that into a: > > pcibios_can_assign_resources() > > Which I could make return 0 when running on such a platform. That would, I think, end up looking similar to my patch in this thread with a weak symbol at the topmost level? Would the weak symbol decided to look at the PCI flags based upon a compile-time #ifdef, e.g., #ifdef CONFIG_PCI bool __attribute__((weak)) pcibios_can_assign_resources() { #ifdef PCI_PROBE_ONLY if (pci_has_flag(PCI_PROBE_ONLY)) return false; return true; #else return true; #endif } #else bool __attribute__((weak)) pcibios_can_assign_resources() { return false; } #endif ? > > > I defined a weak version of this function that returns true if > > > CONFIG_PCI_IOV is set (which I think should reflect the current setup > > > that CONFIG_PCI_IOV represents an unconditional support of SR-IOV) and > > > false otherwise. The only architecture that implements the hook is > > > powerpc, which uses a machine callback to decide if a platform supports > > > SR-IOV or not. I only defined one such callback, for the pseries > > > platform, and return false unconditionally there. > > > > I think this approach is a band-aid that happens to cover up the > > problem. I'd rather fix the PF enable path so it doesn't depend on VF > > BAR allocation. That seems like a much cleaner approach to me. I > > think your pci_select_bars() patch is acceptable for now, even though > > I would still like to see us move all the VF BAR stuff out of the > > pci_dev and into the pci_sriov. > > It's more than just the BAR allocation it seems. The code in > sriov_init() also unconditionally sets the VF count and change the > system page size setting, which appears to break some adapters when the > VF BAR hasn't been assigned. Either that or the attempt at changing the > page size breaks the firmware or the driver (Nish, have you tried just > preventing that part ?) sriov_init() definitely has issues. I haven't had the time to dive down into sriov_init() to determine which call it is that actually causes the break. I will try to find some time to do that either today or on Monday. > In any case, it makes no sense to turn VFs on etc... that way, I think a > lot of that code should be moved to be done lazily when the driver > actually tries to enable a VF. That way the check for the validity of > the IOV resource can be done at that point too. Minimally, I think we can remove: pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total); from sriov_init() as this is going to be overridden by sriov_enable()'s pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, nr_virtfn); ? Thanks, Nish -- Nishanth Aravamudan <nacc@xxxxxxxxxx> IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html