On Thu, Aug 16, 2018 at 09:38:41AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2018-08-15 at 15:40 -0700, Guenter Roeck wrote: > > On Thu, Aug 16, 2018 at 07:50:13AM +1000, Benjamin Herrenschmidt wrote: > > > (Resent with lkml on copy) > > > > > > [Note: This isn't meant to be merged, it need splitting at the very > > > least, see below] > > > > > > This is something I cooked up quickly today to test if that would fix > > > my problems with large number of switch and NVME devices on POWER. > > > > > > > Is that a problem that can be reproduced with a qemu setup ? > > With difficulty... mt-tcg might help, but you need a rather large > systems to reproduce it. > > My repro-case is a 2 socket POWER9 system (about 40 cores off the top > of my mind, so 160 threads) with 72 NVME devices underneath a tree of > switches (I don't have the system at hand today to check how many). > > It's possible to observe it I suppose on a smaller system (in theory a > single bridge with 2 devices is enough) but in practice the timing is > extremely hard to hit. > > You need a combination of: > > - The bridges come up disabled (which is the case when Linux does the > resource assignment, such as on POWER but not on x86 unless it's > hotplug) > > - The nvme devices try to enable them simultaneously > > Also the resulting error is a UR, I don't know how well qemu models > that. > Not well enough, apparently. I tried for a while, registering as many nvme drives as the system would take, but I was not able to reproduce the problem with qemu. It was worth a try, though. Guenter