On Tue, 2012-08-07 at 23:00 -0700, Bjorn Helgaas wrote: > On Tue, Aug 7, 2012 at 2:50 PM, Don Dutile <ddutile@xxxxxxxxxx> wrote: > > On 08/06/2012 04:47 PM, Bjorn Helgaas wrote: > >> > >> On Sun, Aug 5, 2012 at 11:55 PM, Alex Williamson > >> <alex.williamson@xxxxxxxxxx> wrote: > >>> > >>> On Sun, 2012-08-05 at 23:30 -0600, Bjorn Helgaas wrote: > >>>> > >>>> On Sat, Aug 4, 2012 at 12:19 PM, Alex Williamson > >>>> <alex.williamson@xxxxxxxxxx> wrote: > >>>>> > >>>>> It's possible to have buses without an associated bridge > >>>>> (bus->self == NULL). SR-IOV can generate such buses. When > >>>>> we find these, skip to the parent bus to look for the next > >>>>> ACS test. > >>>> > >>>> > >>>> To make sure I understand the problem here, I think you're referring > >>>> to the situation where an SR-IOV device can span several bus numbers, > >>>> e.g., the "VFs Spanning Multiple Bus Numbers" implementation note in > >>>> the SR-IOV 1.1 spec, sec. 2.1.2. > >>>> > >>>> It says "All PFs must be located on the Device's captured Bus Number" > >>>> -- I think that means every PF will be directly on a bridge's > >>>> secondary bus and hence will have a valid dev->bus->self pointer. > >>>> > >>>> However, VFs need not be on the same bus number. If a VF is on > >>>> (captured Bus Number plus 1), I think we allocate a new struct pci_bus > >>>> for it, but there's no P2P bridge that leads to that bus, so the > >>>> bus->self pointer is probably NULL. > >>> > >>> > >>> Yes, exactly. virtfn_add_bus() is where we're creating this new bus. > >>> > >>>> This makes me quite nervous, because I bet there are many places that > >>>> assume every non-root bus has a valid bus->self pointer -- I know I > >>>> certainly had that assumption. > >>>> > >>>> I looked at callers of pci_is_root_bus(), and at first glance, it seems > >>>> like > >>>> iommu_init_device(), intel_iommu_add_device(), pci_acs_path_enabled(), > >>> > >>> > >>> > >>> These 3 are handled by this patch, plus the intel and amd iommu patches > >>> I sent. > >>> > >>>> pci_get_interrupt_pin(), pci_common_swizzle(), > >>> > >>> > >>> If sr-iov is the only source of these virtual buses, these are probably > >>> ok since VFs don't support INTx. > >>> > >>>> pci_find_upstream_pcie_bridge(), and > >>> > >>> > >>> Here the pci_is_root_bus() is after a pci_is_pcie() check, so again if > >>> sr-iov only (and assuming VFs properly report PCIe capability), we > >>> shouldn't stumble on it. > >>> > >>>> pci_bus_release_bridge_resources() all might have similar problems. > >>> > >>> > >>> This one might deserve further investigation. Thanks, > >> > >> > >> We can fix all these places piecemeal, but that doesn't feel like a > >> very satisfying solution. It makes it much harder to know that each > >> place is correct, and this oddity of a bus with no upstream bridge is > >> still lying around, waiting to bite us again later. > >> > >> What other possible ways of fixing this do we have? Could we set > >> bus->self (multiple buses would then point to the same bridge, and I > >> don't know if that would break something)? Add something like a > >> pci_upstream_p2p_bridge() interface that would encapsulate traversing > > > > ^^^ and this name will reduce the confusion? :) > > I don't claim that :) I just wanted to explore other possible > solutions. Changing every loop that searches the parent chain so it > knows about this SR-IOV oddity doesn't seem like the ideal solution, > though maybe it's the best we can do given the constraints. Yep, these are the two alternatives I thought of too, set bus->self or create a helper function. The former leaves present assumptions in place, but I don't know whether we'll break something else having two buses claiming to be sourced by the same device. Maintaining the NULL self pointer almost feels like a better representation of the hardware, but requires evaluating all the current users and coming up with a helper. That's more work, but we probably want to move away from everyone manually walking pci data structures anyway. > >> Since these fake VF buses don't have a bridge that points to them, I > > > > Well, they aren't fake busses, just ARI-identifiers, which translate the > > B:D.F/8:5.3 > > format to simply a 16-bit i.d. > > I think an SR-IOV device can consume multiple bus numbers even without > ARI (in fact, I think ARI reduces the number of bus numbers the > device requires ... e.g., a PF and 15 VFs would require two bus > numbers without ARI (04:00.0 - 04:00.7 and 05:00.0 - 05:00.7) but only > one bus number with ARI (04:00.0 - 04:01.7)). (I think "04:01.7" is > how Linux would represent the 8-bit function number ARI gives you. > You could also think of it as "04:00.0f") Yep, that's what I think is happening too. I've been testing on a system with ARI, so using the same device as David, PFs at 1:00.0/1 and VFs at 1:10.0 - 1:11.5. The sr-iov capability reports VF offset: 128, stride: 2. IIRC, w/o ARI this same device reports VF offset 384 (256 + 128), which seems to match David's lspci. > > > So, VF devices should be attached to same bus->devices list as it's PF. > > I don't think it works that way today, does it? In the SR-IOV spec > example in sec 2.1.2: > > PF 0 at 04:00.0 > ARI Capable is set > First VF Offset = 1, VF Stride = 1, NumVFs = 600 > > I think we have three separate bus->devices lists: > > pci_bus 04: devices list contains PF 0 and VF 0,1 through VF 0,255 > pci_bus 05: devices list contains VF 0,256 - VF 0,511 > pci_bus 06: devices list contains VF 0,512 - VF 0,600 > > > pci_dev->bus should be same bus ptr as PF's pci_dev as well, since the > > VF uses all that's busses resources, support functions (cfg, dma-ops, etc.) > > as well. > > Searching the driver/pci area, support of functions like AER want the > > bus struct that's receiving/handling the PCIe error, associated (hw) port, > > etc., > > so another reason the VF's pci-dev bus ptr should be the same as the PF's. > > Maybe every VF *should* have the same dev->bus pointer as the PF, but > I don't think it does today. I think we only store the bus number in > the struct pci_bus, so if we *did* give all the VFs the same dev->bus > pointer and put all the VFs in the same bus->devices list, we'd have > to store the bus number elsewhere, e.g., in the struct pci_dev. > > That might make sense, but the magnitude of a change like that makes > my head hurt -- it would affect drivers, arch code, config accessors, > etc. Yep, I agree, that's a total rework of what a struct pci_bus represents. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html