On Mon, Apr 23, 2012 at 03:46:03PM -0400, Don Dutile wrote: >On 04/22/2012 11:52 AM, Richard Yang wrote: >>All, >> >>I am reading the pci_scan_bridge() and not sure what will happen in >>following situation. >> >>Suppose the kernel is not passed the pci=assign-busses. >> >>Below is a picture about the pci system. >> >> +-------+ >> | | root bridge(0,255) >> +---+---+ >> | Bus 0 >> -----+-----------+------------------------------+-- >> | | >> | | >> | | >> +----+----+ +-----+-----+ >> | | B1(1,15) | |B2(16,28) >> +----+----+ +-----+-----+ >> | Bus 1 | Bus 16 >> -----+----------------------- ----------+---------------- >> | >> +----+----+ >> | | B3 >> +---------+ >> >>Suppose B1 and B2 works fine with the BIOS, which get the right bus >>number and range. >> >>B3 does not works fine with the BIOS, which doesn't get the bus number. >> >>So in pci_scan_bridge(), B3 will be met in the second pass and get bus >>number 16? > >unfortunately, today, the answer is yes. >I have run into a similar problem recently when trying to use pci=assign-busses >with an SRIOV device behind a non-ARI-capable PCIe switch. >In this scenario, the assign-busses code assigned the next bus number, >which conflicted with an existing one on the system, and hangs the >system -- two bridges responding to the same PCI bus num evidently >confuses the hw! ;-) Hmm... seems we are not talking about the same case. My case is the kernel not passed with pci=assign-busses. I think, if pci=assign-busses is used, kernel will just ignore the bus number assigned by BIOS, and do the assignment itself. > >The PCI code is suppose to do two bus scans -- pass=0: to see what the BIOS >has setup, and then pass=1 to assign non-BIOS setup devices. >But, what I'm finding is that when pci=assign-busses is set, the >pass=0 scan is not doing a full PCI tree scan and registering all >the BIOS-setup busses first, and it tries to do extended bus assignment in pass=0, >not pass=1; in the above configuration, it expands B1's bus num range from (1,15) >to (1,16), then tries to scan behind it. that creates an overlap btwn >B1 & B2's sec/sub bus-num ranges, and they both respond to a Type1 config cycle >with a bus-number of 16 (typically when trying to read the VID register of 16:0.0 >in this case).... boom! ... or more like silence due to system hang... > >*If* the system spaces bus ranges apart, e.g., in your config above, >if the BIOS setup B1(1,15) and B2(24,32), then pci=assign-busses will >work because bus num 16 is free, and two bridges won't think they both >respond to type1 pci config cycle (with bus-number=16 lying in their sec/sub-bus num range), >and all will (luckily) work. > >Unfortunately, I'm in & out of work due to at-home time requirements, >so I haven't had a chance to work out a proper patch. >What should happen in the above case, is the kernel prints a warning saying >it couldn't do needed assign-busses operations due to configuration constraints... >and continue to do pci (pass=1) bridge scanning.... and not wedge the system >as it does now. >The base problem is that >(a)pass=0 is doing bus-assigning, and it shouldn't be done > until pass=1, after all known BIOS-setup busses are known >(b) the code doesn't have a nice warning and continuation when this > conflict occurs. > >>Would this be a conflict? >> >summary: yes. -- Richard Yang Help you, Help me -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html