? 2016/6/23 9:35, Brian Norris ??: > Hi, > > On Thu, Jun 23, 2016 at 09:09:46AM +0800, Shawn Lin wrote: >> ? 2016/6/23 8:29, Brian Norris ??: >>> On Thu, Jun 16, 2016 at 09:50:35AM +0800, Shawn Lin wrote: > > [...] > >>>> + /* 500ms timeout value should be enough for gen1/2 taining */ >>>> + timeout = jiffies + msecs_to_jiffies(500); >>>> + >>>> + err = -ETIMEDOUT; >>>> + while (time_before(jiffies, timeout)) { >>>> + status = pcie_read(port, PCIE_CLIENT_BASIC_STATUS1); >>>> + if (((status >> PCIE_CLIENT_LINK_STATUS_SHIFT) & >>>> + PCIE_CLIENT_LINK_STATUS_MASK) == >>>> + PCIE_CLIENT_LINK_STATUS_UP) { >>>> + dev_dbg(port->dev, "pcie link training gen1 pass!\n"); >>>> + err = 0; >>>> + break; >>>> + } >>>> + msleep(20); >>>> + } >>> >>> Technically, the above timeout loop is not quite correct. Possible error >>> case: we can fail with a timeout after 500 ms if the training completes >>> between the 480-500 ms time window. This can happen because you're >>> doing: >>> >>> (1) read register: if complete, then terminate successfully >>> (2) delay >>> (3) check for timeout: if time is up, return error >>> >>> You actually need: >>> >>> (1) read register: if complete, then terminate successfully >>> (2) delay >>> (3) check for timeout: if time is up, repeat (1), and then report error >>> >>> You can examine the logic for readx_poll_timeout() in >>> include/linux/iopoll.h to see an example of a proper timeout loop. You >>> could even try to use one of the helpers there, except that your >>> pcie_read() takes 2 args. >> >> I see, thanks. >> >>> >>>> + if (err) { >>>> + dev_err(port->dev, "pcie link training gen1 timeout!\n"); >>>> + return err; >>>> + } >>>> + >>>> + /* >>>> + * Enable retrain for gen2. This should be configured only after >>>> + * gen1 finished. >>>> + */ >>>> + status = pcie_read(port, >>>> + PCIR_RC_CONFIG_LCS + PCIE_RC_CONFIG_BASE); >>>> + status |= PCIE_CORE_LCSR_RETAIN_LINK; >>>> + pcie_write(port, status, >>>> + PCIR_RC_CONFIG_LCS + PCIE_RC_CONFIG_BASE); >>> >>> I'm not really an expert on this, but how are you actually "retraining >>> for gen2"? Is that just the behavior of the core, that it retries at the >>> higher rate on the 2nd training attempt? I'm just confused, since you >>> set PCIE_CLIENT_GEN_SEL_2 above, so you've already allowed either gen1 >>> or gen2 negotiation AFAICT, and so seemingly you might already have >>> negotiated at gen2. >> >> >> Not really. I allow the core to enable gen2, but it needs a extra >> setting of retrain after finishing gen1. It's not so strange as it >> depends on the vendor's design. So I have to say it fits the >> designer's expectation. > > OK. > >>>> + err = -ETIMEDOUT; >>>> + >>>> + while (time_before(jiffies, timeout)) { >>> >>> You never reset 'timeout' when starting this loop. So you only have a >>> cumulative 500 ms to do both the gen1 and gen2 loops. Is that >>> intentional? (I feel like this comment was made on an earlier revision >>> of this driver, though I haven't read everything thoroughly.) >> >> yes, I don't have any docs to let me know how long should I wait for >> gen1/2 to be finished. Maybe someday it will be augmented to a larger >> value if finding a device actually need a longer time. But the only >> thing I can say is that it's from my test for many pcie devices >> currently. >> >> >> Do you agree? > > I'm not suggesting increasing the timeout, exactly; I'm suggesting that > you should set some minimum timeout for each training loop, instead of > reusing the same deadline. i.e., something like this before the second > loop: > > /* Reset the deadline for gen2 */ > timeout = jiffies + msecs_to_jiffies(500); ok, I will update it. > > As it stands, if the first loop takes 490 ms, that leaves you only 10 ms > for the second loop. And I think it'd be confusing if we ever see the > first loop take a "long" time, and then we time out on the second -- > we'd be blaming the gen2 training for gen1's slowness. > >>>> + status = pcie_read(port, PCIE_CORE_CTRL_MGMT_BASE); >>>> + if (((status >> PCIE_CORE_PL_CONF_SPEED_SHIFT) & >>>> + PCIE_CORE_PL_CONF_SPEED_MASK) == >>>> + PCIE_CORE_PL_CONF_SPEED_50G) { >>>> + dev_dbg(port->dev, "pcie link training gen2 pass!\n"); >>>> + err = 0; >>>> + break; >>>> + } >>>> + msleep(20); >>>> + } >>> >>> Similar problem with your timeout loop, as mentioned above. Although I >>> confused about what you do in the error case here... >>> >>>> + if (err) >>>> + dev_dbg(port->dev, "pcie link training gen2 timeout, force to gen1!\n"); >>> >>> ... how are you forcing gen1? I don't see any code that indicates this. >>> Should you at least be checking that there aren't some kind of training >>> errors, and that we settled back on gen1/2.5G? >> >> yes, when failing gen2, my pcie controller will fallback to gen1 >> automatically. > > OK. Well maybe the text should say something like "falling back" instead > of "force"? > Sounds good, will fix. Thanks. >> if we pass the gen1 then fail to train gen2, a dbg msg here is enough >> here to let we know that we should check the HW signal. Of course we >> should make sure that this device supports gen2. > > OK. > > [...] > > Brian > > > -- Best Regards Shawn Lin