On Mon, Nov 21, 2016 at 10:53:52AM -0600, Bjorn Helgaas wrote: > On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote: > > Hi Johannes, > > > > On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote: > > > The Read Completion Boundary (RCB) bit must only be set on a device or > > > endpoint if it is set on the root complex. > > > > I propose the following slightly modified patch. The interesting > > difference is that your patch only touches the _HPX "OR" mask, so it > > refrains from *setting* RCB in some cases, but it never actually > > *clears* it. The only time we clear RCB is when the _HPX "AND" mask > > has RCB == 0. > > > > My intent below is that we completely ignore the _HPX RCB bits, and we > > set an Endpoint's RCB if and only if the Root Port's RCB is set. > > > > I made an ugly ASCII table to think about the cases: > > > > Root EP _HPX _HPX Final Endpoint RCB state > > Port (init) AND OR (curr) (yours) (mine) > > 0) 0 0 0 0 0 0 0 > > 1) 0 0 0 1 1 0 0 > > 2) 0 0 1 0 0 0 0 > > 3) 0 0 1 1 1 0 0 > > 4) 0 1 0 0 0 0 0 > > 5) 0 1 0 1 1 0 0 > > 6) 0 1 1 0 1 1 0 > > 7) 0 1 1 1 1 1 0 > > 8) 1 0 0 0 0 0 1 > > 9) 1 0 0 1 1 1 1 > > A) 1 0 1 0 0 0 1 > > B) 1 0 1 1 1 1 1 > > C) 1 1 0 0 0 0 1 > > D) 1 1 0 1 1 1 1 > > E) 1 1 1 0 1 1 1 > > F) 1 1 1 1 1 1 1 > > > > Cases 0-7 should all result in the Endpoint RCB being zero because the > > Root Port RCB is zero. Case 1 is the bug you're fixing. Cases 3 & 5 > > are similar hypothetical bugs your patch also fixes. > > > > Cases 6 & 7, where firmware left the Endpoint RCB set and _HPX didn't > > tell us to clear it, are hypothetical firmware bugs that your patch > > wouldn't fix. > > > > In cases 8, A, and C, we currently leave the Endpoint RCB cleared, > > either because firmware left it clear and _HPX didn't tell us to set > > it (8 and A), or because firmware set it but _HPX told us to clear it > > (C). > > > > One could argue that 8, A, and C should stay as they currently are, as > > a way for _HPX to work around hardware bugs, e.g., a Root Port that > > advertises a 128-byte RCB but doesn't actually support it. I didn't > > bother with that and set the Endpoint's RCB to 128 in all cases when > > the Root Port claims to support it. > > > > It'd be great if you could test this and comment. > > > > If you get a chance, collect the /proc/iomem contents, too. That's > > not for this bug; it's because I'm curious about the > > > > ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST > > > > problem in your dmesg log. > > Oops, I goofed and forgot to clear RCB by default. > Here's the fixed one. Yep, my contact already noticed. I have heard rumors that the first two patches worked on RHEL and the 3rd one didn't (but that's just rumors) so I try to persuade our field engineer to spend another day testing the patches. But please be aware this is a bit cumbersome as I don't have access to the machine and our field engineer only has remote access as well. Byte, Johannes -- Johannes Thumshirn Storage jthumshirn@xxxxxxx +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html