On Monday 31 July 2023 18:12:23 Bjorn Helgaas wrote: > [+cc Pali, Marek because I used f76b36d40bee ("PCI: aardvark: Fix link > training") as an example] > > On Mon, Jul 31, 2023 at 01:52:35PM +0800, Kevin Xie wrote: > > On 2023/7/28 5:40, Bjorn Helgaas wrote: > > > On Tue, Jul 25, 2023 at 03:46:35PM -0500, Bjorn Helgaas wrote: > > >> On Mon, Jul 24, 2023 at 06:48:47PM +0800, Kevin Xie wrote: > > >> > On 2023/7/21 0:15, Bjorn Helgaas wrote: > > >> > > On Thu, Jul 20, 2023 at 06:11:59PM +0800, Kevin Xie wrote: > > >> > >> On 2023/7/20 0:48, Bjorn Helgaas wrote: > > >> > >> > On Wed, Jul 19, 2023 at 06:20:56PM +0800, Minda Chen wrote: > > >> > >> >> Add StarFive JH7110 SoC PCIe controller platform > > >> > >> >> driver codes. > > >> > > >> > >> However, in the compatibility testing with several NVMe SSD, we > > >> > >> found that Lenovo Thinklife ST8000 NVMe can not get ready in 100ms, > > >> > >> and it actually needs almost 200ms. Thus, we increased the T_PVPERL > > >> > >> value to 300ms for the better device compatibility. > > >> > > ... > > >> > > > > >> > > Thanks for this valuable information! This NVMe issue potentially > > >> > > affects many similar drivers, and we may need a more generic fix so > > >> > > this device works well with all of them. > > >> > > > > >> > > T_PVPERL is defined to start when power is stable. Do you have a way > > >> > > to accurately determine that point? I'm guessing this: > > >> > > > > >> > > gpiod_set_value_cansleep(pcie->power_gpio, 1) > > >> > > > > >> > > turns the power on? But of course that doesn't mean it is instantly > > >> > > stable. Maybe your testing is telling you that your driver should > > >> > > have a hardware-specific 200ms delay to wait for power to become > > >> > > stable, followed by the standard 100ms for T_PVPERL? > > >> > > > >> > You are right, we did not take the power stable cost into account. > > >> > T_PVPERL is enough for Lenovo Thinklife ST8000 NVMe SSD to get ready, > > >> > and the extra cost is from the power circuit of a PCIe to M.2 connector, > > >> > which is used to verify M.2 SSD with our EVB at early stage. > > >> > > >> Hmm. That sounds potentially interesting. I assume you're talking > > >> about something like this: https://www.amazon.com/dp/B07JKH5VTL > > >> > > >> I'm not familiar with the timing requirements for something like this. > > >> There is a PCIe M.2 spec with some timing requirements, but I don't > > >> know whether or how software is supposed to manage this. There is a > > >> T_PVPGL (power valid to PERST# inactive) parameter, but it's > > >> implementation specific, so I don't know what the point of that is. > > >> And I don't see a way for software to even detect the presence of such > > >> an adapter. > > > > > > I intended to ask about this on the PCI-SIG forum, but after reading > > > this thread [1], I don't think we would learn anything. The question > > > was: > > > > > > The M.2 device has 5 voltage rails generated from the 3.3V input > > > supply voltage > > > ------------------------------------------- > > > This is re. Table 17 in PCI Express M.2 Specification Revision 1.1 > > > Power Valid* to PERST# input inactive : Implementation specific; > > > recommended 50 ms > > > > > > What exactly does this mean ? > > > > > > The Note says > > > > > > *Power Valid when all the voltage supply rails have reached their > > > respective Vmin. > > > > > > Does this mean that the 50ms to PERSTn is counted from the instant > > > when all *5 voltage rails* on the M.2 device have become "good" ? > > > > > > and the answer was: > > > > > > You wrote; > > > Does this mean that the 50ms to PERSTn is counted from the instant > > > when all 5 voltage rails on the M.2 device have become "good" ? > > > > > > Reply: > > > This means that counting the recommended 50 ms begins from the time > > > when the power rails coming to the device/module, from the host, are > > > stable *at the device connector*. > > > > > > As for the time it takes voltages derived inside the device from any > > > of the host power rails (e.g., 3.3V rail) to become stable, that is > > > part of the 50ms the host should wait before de-asserting PERST#, in > > > order ensure that most devices will be ready by then. > > > > > > Strictly speaking, nothing disastrous happens if a host violates the > > > 50ms. If it de-asserts too soon, the device may not be ready, but > > > most hosts will try again. If the host de-asserts too late, the > > > device has even more time to stabilize. This is why the WG felt that > > > an exact minimum number for >>Tpvpgl, was not valid in practice, and > > > we made it a recommendation. > > > > > > Since T_PVPGL is implementation-specific, we can't really base > > > anything in software on the 50ms recommendation. It sounds to me like > > > they are counting on software to retry config reads when enumerating. > > > > > > I guess the delays we *can* observe are: > > > > > > 100ms T_PVPERL "Power stable to PERST# inactive" (CEM 2.9.2) > > > 100ms software delay between reset and config request (Base 6.6.1) > > > > Refer to Figure2-10 in CEM Spec V2.0, I guess this two delays are T2 & T4? > > In the PATCH v2[4/4], T2 is the msleep(100) for T_PVPERL, > > and T4 is done by starfive_pcie_host_wait_for_link(). > > Yes, I think "T2" is T_PVPERL. The CEM r2.0 Figure 2-10 note is > "2. Minimum time from power rails within specified tolerance to > PERST# inactive (T_PVPERL)." > > As far as T4 ("Minimum PERST# inactive to PCI Express link out of > electrical idle"), I don't see a name or a value for that parameter, > and I don't think it is the delay required by PCIe r6.0, sec 6.6.1. > > The delay required by sec 6.6.1 is a minimum of 100ms following exit > from reset or, for fast links, 100ms after link training completes. > > The comment at the call of advk_pcie_wait_for_link() [2] says it is > the delay required by sec 6.6.1, but that doesn't seem right to me. > > For one thing, I don't think 6.6.1 says anything about "link up" being > the end of a delay. So if we want to do the delay required by 6.6.1, > "wait_for_link()" doesn't seem like quite the right name. > > For another, all the *_wait_for_link() functions can return success > after 0ms, 90ms, 180ms, etc. They're unlikely to return after 0ms, > but 90ms is quite possible. If we avoided the 0ms return and > LINK_WAIT_USLEEP_MIN were 100ms instead of 90ms, that should be enough > for slow links, where we need 100ms following "exit from reset." > > But it's still not enough for fast links where we need 100ms "after > link training completes" because we don't know when training > completed. If training completed 89ms into *_wait_for_link(), we only > delay 1ms after that. Please look into discussion "How long should be PCIe card in Warm Reset state?" including external references where are more interesting details: https://lore.kernel.org/linux-pci/20210310110535.zh4pnn4vpmvzwl5q@pali/ About wait for the link, this should be done asynchronously... > > > The PCI core doesn't know how to assert PERST#, so the T_PVPERL delay > > > definitely has to be in the host controller driver. > > > > > > The PCI core observes the second 100ms delay after a reset in > > > pci_bridge_wait_for_secondary_bus(). But this 100ms delay does not > > > happen during initial enumeration. I think the assumption of the PCI > > > core is that when the host controller driver calls pci_host_probe(), > > > we can issue config requests immediately. > > > > > > So I think that to be safe, we probably need to do both of those 100ms > > > delays in the host controller driver. Maybe there's some hope of > > > supporting the latter one in the PCI core someday, but that's not > > > today. > > > > > > Bjorn > > > > > > [1] https://forum.pcisig.com/viewtopic.php?f=74&t=1037 > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pci-aardvark.c?id=v6.4#n433