Hi, On Thu, Jun 23, 2016 at 09:09:46AM +0800, Shawn Lin wrote: > ? 2016/6/23 8:29, Brian Norris ??: > >On Thu, Jun 16, 2016 at 09:50:35AM +0800, Shawn Lin wrote: [...] > >>+ /* 500ms timeout value should be enough for gen1/2 taining */ > >>+ timeout = jiffies + msecs_to_jiffies(500); > >>+ > >>+ err = -ETIMEDOUT; > >>+ while (time_before(jiffies, timeout)) { > >>+ status = pcie_read(port, PCIE_CLIENT_BASIC_STATUS1); > >>+ if (((status >> PCIE_CLIENT_LINK_STATUS_SHIFT) & > >>+ PCIE_CLIENT_LINK_STATUS_MASK) == > >>+ PCIE_CLIENT_LINK_STATUS_UP) { > >>+ dev_dbg(port->dev, "pcie link training gen1 pass!\n"); > >>+ err = 0; > >>+ break; > >>+ } > >>+ msleep(20); > >>+ } > > > >Technically, the above timeout loop is not quite correct. Possible error > >case: we can fail with a timeout after 500 ms if the training completes > >between the 480-500 ms time window. This can happen because you're > >doing: > > > >(1) read register: if complete, then terminate successfully > >(2) delay > >(3) check for timeout: if time is up, return error > > > >You actually need: > > > >(1) read register: if complete, then terminate successfully > >(2) delay > >(3) check for timeout: if time is up, repeat (1), and then report error > > > >You can examine the logic for readx_poll_timeout() in > >include/linux/iopoll.h to see an example of a proper timeout loop. You > >could even try to use one of the helpers there, except that your > >pcie_read() takes 2 args. > > I see, thanks. > > > > >>+ if (err) { > >>+ dev_err(port->dev, "pcie link training gen1 timeout!\n"); > >>+ return err; > >>+ } > >>+ > >>+ /* > >>+ * Enable retrain for gen2. This should be configured only after > >>+ * gen1 finished. > >>+ */ > >>+ status = pcie_read(port, > >>+ PCIR_RC_CONFIG_LCS + PCIE_RC_CONFIG_BASE); > >>+ status |= PCIE_CORE_LCSR_RETAIN_LINK; > >>+ pcie_write(port, status, > >>+ PCIR_RC_CONFIG_LCS + PCIE_RC_CONFIG_BASE); > > > >I'm not really an expert on this, but how are you actually "retraining > >for gen2"? Is that just the behavior of the core, that it retries at the > >higher rate on the 2nd training attempt? I'm just confused, since you > >set PCIE_CLIENT_GEN_SEL_2 above, so you've already allowed either gen1 > >or gen2 negotiation AFAICT, and so seemingly you might already have > >negotiated at gen2. > > > Not really. I allow the core to enable gen2, but it needs a extra > setting of retrain after finishing gen1. It's not so strange as it > depends on the vendor's design. So I have to say it fits the > designer's expectation. OK. > >>+ err = -ETIMEDOUT; > >>+ > >>+ while (time_before(jiffies, timeout)) { > > > >You never reset 'timeout' when starting this loop. So you only have a > >cumulative 500 ms to do both the gen1 and gen2 loops. Is that > >intentional? (I feel like this comment was made on an earlier revision > >of this driver, though I haven't read everything thoroughly.) > > yes, I don't have any docs to let me know how long should I wait for > gen1/2 to be finished. Maybe someday it will be augmented to a larger > value if finding a device actually need a longer time. But the only > thing I can say is that it's from my test for many pcie devices > currently. > > > Do you agree? I'm not suggesting increasing the timeout, exactly; I'm suggesting that you should set some minimum timeout for each training loop, instead of reusing the same deadline. i.e., something like this before the second loop: /* Reset the deadline for gen2 */ timeout = jiffies + msecs_to_jiffies(500); As it stands, if the first loop takes 490 ms, that leaves you only 10 ms for the second loop. And I think it'd be confusing if we ever see the first loop take a "long" time, and then we time out on the second -- we'd be blaming the gen2 training for gen1's slowness. > >>+ status = pcie_read(port, PCIE_CORE_CTRL_MGMT_BASE); > >>+ if (((status >> PCIE_CORE_PL_CONF_SPEED_SHIFT) & > >>+ PCIE_CORE_PL_CONF_SPEED_MASK) == > >>+ PCIE_CORE_PL_CONF_SPEED_50G) { > >>+ dev_dbg(port->dev, "pcie link training gen2 pass!\n"); > >>+ err = 0; > >>+ break; > >>+ } > >>+ msleep(20); > >>+ } > > > >Similar problem with your timeout loop, as mentioned above. Although I > >confused about what you do in the error case here... > > > >>+ if (err) > >>+ dev_dbg(port->dev, "pcie link training gen2 timeout, force to gen1!\n"); > > > >... how are you forcing gen1? I don't see any code that indicates this. > >Should you at least be checking that there aren't some kind of training > >errors, and that we settled back on gen1/2.5G? > > yes, when failing gen2, my pcie controller will fallback to gen1 > automatically. OK. Well maybe the text should say something like "falling back" instead of "force"? > if we pass the gen1 then fail to train gen2, a dbg msg here is enough > here to let we know that we should check the HW signal. Of course we > should make sure that this device supports gen2. OK. [...] Brian