On Tue, 2014-04-15 at 18:09 -0600, Bjorn Helgaas wrote: > > Thanks for the example. Please open a bug report at > http://bugzilla.kernel.org and attach the complete dmesg logs before > and after Yinghai's patch. > > Having the complete logs helps me answer questions myself without > having to bother you, and it also helps me figure out whether we can > improve our logging to make it easier to diagnose problems like this. Unfortunately, for a *little while* longer (hint !) we can't publish a complete log from a Power8 machine, but we should be able to include everything remotely related to PCI. > > | pci 0003:05:00.0: reg 0x10: [mem 0x3d05801000000-0x3d058010fffff 64bit] > > | pci 0003:05:00.0: reg 0x18: [mem 0x3d05010000000-0x3d05017ffffff 64bit pref] > > | pci 0003:05:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref] > > | pci 0003:05:00.0: reg 0x134: [mem 0x3d05018000000-0x3d0501fffffff 64bit pref] > > > > This is printed at enumeration phase. This device has a SRIOV BAR with > > size of 0x7ffffff (128M). That's the size of a signle VF BAR. The device > > supports 63 VFs so we need near 8G space in total. Apparanlty we need > > exploit 64-bit space. > > Yes. Do we print a hint anywhere about how many VFs there are? In > other words, can you deduce the number "63" from the dmesg, or do you > have to figure that out some other way? It'd be nice if that > information were somewhere in dmesg. > > > | PCI host bridge to bus 0003:00 > > | pci_bus 0003:00: root bus resource [mem 0x3d05800000000-0x3d0587ffeffff] (bus address [0x80000000-0xfffeffff]) > > | pci_bus 0003:00: root bus resource [mem 0x3d05008000000-0x3d057ffffffff 64bit pref] > > > > And we do have a huge (32G) 64-bit prefetchable window supply. We expect > > everything to work fine, but: > > > > | pci 0003:00:00.0: BAR 15: can't assign mem pref (size 0x206000000) > > | pci 0003:00:00.0: BAR 14: assigned [mem 0x3d05800000000-0x3d05802ffffff] > > | pci 0003:00:00.0: BAR 13: can't assign io (size 0x4000) > > > > It went wrong at the beginning. Note the error message never considers > > 64-bit or not, but BAR 15 here has it MEM_64 flag cleared. > > BAR 15 is a bridge window. I think its resource flags should reflect > the capability of the *window*, even if we disable the window or we > happen to assign addresses that are under 4GB. So I think it's wrong > that we clear the MEM_64 flag in pbus_size_mem() and the IO flag in > pbus_size_io(). > > > It first > > tried to find a 32-bit prefetchable window, but we only supply a 64-bit one. > > So it fall back to (32-bit) non-prefetchable window, but there is no enough > > room there. At last it went into complicated steps (not show here) of > > allocating requested resource first, then try best for the optional ones, etc.. > > > > Why is BAR 15 (prefetchable) 32 bit instead of 64? Because PCI core favours > > 32-bit prefetchable BARs and we have some. This is one of them: > > > > | pci 0003:05:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref] > > > > PCI core decides to let them enjoy the benefition of prefetch. They can't > > bear the risk of getting 4G-above address, so its parent, its parent's parent, > > its parent's parent's parent, finally the root bridge (00:00.0) must have their > > MEM_64 flag of prefetchable resource (BAR 15) clear. > > It sounds like we're tracking the resource requirements > (prefetchability and BAR width) by using the flags on bridge windows. > If that's the case, I think it's wrong. We should preserve the bridge > window flags, because those express the bridge hardware capabilities, > and we should explicitly keep track of what's required by devices > below the bridge in some other way. > > > In the end nobody > > is eligible to use the 64-bit (prefetchable) space even we have huge > > supply ! > > > > Note even the resource is small and successfully fall back into 32-bit > > non-prefetchable window, that's still not OK for us because we need > > SRIOV resource be at 64-bit prefetchable space to do platform > > configuration. > > > > With Yinghai's patch, when 64-bit prefetchable BARs found, they're more > > favoured than the 32-bit prefetchable ones (if any), so all upstream bridges' > > prefetchable windows have their MEM_64 flag reserved and the huge 64-bit > > prefetchable space will be exploited: > > > > | pci 0003:00:00.0: BAR 15: assigned [mem 0x3d05008000000-0x3d0521fffffff 64bit pref] > > | pci 0003:00:00.0: BAR 14: assigned [mem 0x3d05800000000-0x3d05802ffffff] > > | pci 0003:00:00.0: BAR 13: can't assign io (size 0x4000) > > > > (The IO resource error here is due to we do not provide IO window) > > Yes. The lack of I/O space is just a constraint of the platform. > It'd be nice if we printed a more meaningful error message in this > case. One really has to be a PCI expert to distinguish this from a > real problem that we need to fix. > > Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html