[Public] > -----Original Message----- > From: Bjorn Helgaas <helgaas@xxxxxxxxxx> > Sent: Thursday, July 29, 2021 5:34 PM > To: Limonciello, Mario <Mario.Limonciello@xxxxxxx> > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; > bhelgaas@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; Marcin Bachry > <hegel666@xxxxxxxxx>; Liang, Prike <Prike.Liang@xxxxxxx>; S-k, Shyam- > sundar <Shyam-sundar.S-k@xxxxxxx> > Subject: Re: [PATCH] PCI: quirks: Quirk PCI d3hot delay for AMD xhci > > On Thu, Jul 29, 2021 at 04:30:28PM -0500, Bjorn Helgaas wrote: > > On Thu, Jul 29, 2021 at 04:09:50PM -0500, Limonciello, Mario wrote: > > > On 7/29/2021 16:06, Bjorn Helgaas wrote: > > > > On Thu, Jul 29, 2021 at 03:42:58PM -0500, Limonciello, Mario wrote: > > > > > On 7/29/2021 15:39, Bjorn Helgaas wrote: > > > > > > On Wed, Jul 21, 2021 at 10:58:58PM -0400, Alex Deucher wrote: > > > > > > > From: Marcin Bachry <hegel666@xxxxxxxxx> > > > > > > > > > > > > > > Renoir needs a similar delay. > > > > > > > > > > > > > > [Alex: I talked to the AMD USB hardware team and the > > > > > > > AMD windows team and they are not aware of any HW > > > > > > > errata or specific issues. The HW works fine in > > > > > > > windows. I was told windows uses a rather generous > > > > > > > default delay of 100ms for PCI state transitions.] > > > > > > > > > > > > > > Signed-off-by: Marcin Bachry <hegel666@xxxxxxxxx> > > > > > > > Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> > > > > > > > > > > > > Added stable tag and applied to pci/pm for v5.15, thanks! > > > > > > > > > > Thanks Bjorn! > > > > > > > > > > Given how small/harmless this is and 5.14 isn't cut yet, any > > > > > chance this could still make one of the -rcX rather than wait for 5.14.1 > instead? > > > > > > > > Done. > > > > > > Thanks! > > > > > > > What's the rest of the story here? Aare we working around a > > > > defect in these XHCI controllers? A defect in Linux? Obviously > > > > nobody wants to have to add a quirk for every new Device ID. It's > > > > not like this should be hard to figure out for your hardware guys > > > > in the lab, and if it turns out to be a Linux problem, we should > > > > fix it so everybody benefits. > > > > > > Maybe you missed the embedded message from Alex above. We had a > > > discussion with our internal team that works with Windows on this, > > > and they told us the default delay is significantly more generous on > Windows. > > > > I did see Alex's message, but it didn't answer the question of whether > > this is a hardware defect or a Linux defect. "It works fine in > > Windows" doesn't mean the hardware conforms to the spec. > > > > PCIe r5.0, sec 5.3.1.4 says "... System Software must allow a minimum > > recovery time following a D3Hot → D0 transition of at least 10 ms (see > > Section 7.9.17), prior to accessing the Function." > > > > If the hardware isn't ready in 10ms, I'd claim that's a hardware > > defect. > > > > If Linux isn't waiting the 10ms, I'd claim that's a Linux defect. > > > > If things work by waiting 100ms, that's nice, but what's the point of > > specs if we have to increase the time and penalize everybody just to > > accommodate some oddball device? > > 10ms after hitting "send" it occurred to me that since all of these quirks are > for AMD devices, we could just make the quirk generic so we wait 100ms for > *all* AMD devices. Then AMD boxes would resume a little slower than > everybody else, but some of the maintenance burden would go away. > We probably only need a slight increase. As I said in the comment on the patch, it seems to only affect a small percentage of boards. For the most part 10ms seems to be fine. More of a corner case, maybe specific to certain platforms. It doesn't show up in silicon validation on our reference boards and then presumably doesn’t show up in windows due the increased timeout. I'll keep this in mind on the next platform and I'll consider a patch to generically increase the timeout for AMD if it proves to still be an issue in the wild again. So far our upcoming platforms (at least our internal engineering platforms don't exhibit this). That said, I don't recall us seeing this issue on any of our reference platforms in the past. Thanks, Alex > I'm only half joking, and I would take that patch if you sent it. > > Bjorn