On Tue, Jun 20, 2023 at 12:02:08PM +0000, Yoshihiro Shimoda wrote: > Hello Serge, > > > From: Serge Semin, Sent: Thursday, June 15, 2023 4:32 AM > > > > On Wed, Jun 14, 2023 at 02:39:29PM +0300, Serge Semin wrote: > > > On Wed, Jun 14, 2023 at 02:30:13AM +0000, Yoshihiro Shimoda wrote: > > > > Hello Serge, > > > > > > > > From: Serge Semin, Sent: Tuesday, June 13, 2023 4:52 AM > > > > > > > > > > On Mon, Jun 12, 2023 at 01:19:02PM +0000, Yoshihiro Shimoda wrote: > > > > > > Hello Serge, > > > > > > > > > > > > > From: Serge Semin, Sent: Friday, June 9, 2023 7:54 PM > > > > > > <snip> > > > > > > > > > > static int rcar_gen4_pcie_start_link(struct dw_pcie *dw) > > > > > > > > > > { > > > > > > > > > > struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw); > > > > > > > > > > int i; > > > > > > > > > > > > > > > > > > > > rcar_gen4_pcie_ltssm_enable(rcar, true); > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > * Require direct speed change here. Otherwise RDLH_LINK_UP of > > > > > > > > > > * PCIEINTSTS0 which is this controller specific register may not > > > > > > > > > > * be set. > > > > > > > > > > */ > > > > > > > > > > > > > > > > > > > if (rcar->needs_speed_change) { > > > > > > > > > > > > > > > > > > Seeing this is specified for the root port only what about > > > > > > > > > replacing the statement with just test whether the rcar_gen4_pcie.mode == > > > > > > > > > DW_PCIE_RC_TYPE? Thus you'll be ablt to drop the needs_speed_change field. > > > > > > > > > > > > > > > > Thank you for the comment. I'll fix it. > > > > > > > > > > > > > > > > > BTW Just curious. Why is the loop below enabled for the Root Port > > > > > > > > > only? What about the end-point controller? It's the same hardware > > > > > > > > > after all.. > > > > > > > > > > > > > > > > This is reused from v16 and then it used "link retraining" which is only for > > > > > > > > the Root Port. As you mentioned, it seems endpoint controller is also needed > > > > > > > > if we use direct speed change. > > > > > > > > > > > > > > > > > > for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) { > > > > > > > > > > rcar_gen4_pcie_speed_change(dw); > > > > > > > > > > msleep(100); > > > > > > > > > > if (rcar_gen4_pcie_check_current_link(dw)) > > > > > > > > > > return 0; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > Did you trace how many iterations this loop normally takes? > > > > > > > > > > > > > > > > i = 0 or 1 (if the max-link-speed is suitable for a connected device.) > > > > > > > > > > > > > > > > > Is it > > > > > > > > > constant or varies for the same platform setup and a connected link > > > > > > > > > partner? Does the number of iterations depend on the target link speed > > > > > > > > > specified via the "max-link-speed" property? > > > > > > > > > > > > > > > > > > > > > > > This is not related to the "max-link-speed". It seems to related to > > > > > > > > a link partner. > > > > > > > > LinkCap max-link-speed loop > > > > > > > > Device A 4 4 1 > > > > > > > > Device A 4 3 1 > > > > > > > > Device B 3 3 0 > > > > > > > > > > > > > > Great! If so I would have just left a single unconditional > > > > > > > rcar_gen4_pcie_speed_change() call placed right after the > > > > > > > rcar_gen4_pcie_ltssm_enable() method with no delays afterwards. These > > > > > > > methods would have been invoked in the framework of > > > > > > > dw_pcie_start_link() after which the dw_pcie_wait_for_link() method is > > > > > > > called with several checks parted with the ~100ms delay. It will make > > > > > > > sure that at least some link is up with the link state printed to the > > > > > > > system log. If for some reason the performance degradation happens > > > > > > > then it will be up to the system administrator to investigate what was > > > > > > > wrong. Your driver did as much is it could to reach the best link gen. > > > > > > > > > > > > IIUC, is your suggestion like the following code? > > > > > > --- > > > > > > rcar_gen4_pcie_ltssm_enable(rcar, true); > > > > > > if (!dw_pcie_wait_for_link(dw)) { > > > > > > rcar_gen4_pcie_speed_change(dw); > > > > > > return 0; > > > > > > } > > > > > > --- > > > > > > > > > > > > Unfortunately, it doesn't work correctly... > > > > > > The following code can work correctly. The value of i is still 1 on the device A. > > > > > > What do you think that the following code is acceptable? > > > > > > --- > > > > > > rcar_gen4_pcie_ltssm_enable(rcar, true); > > > > > > for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) { > > > > > > msleep(100); > > > > > > rcar_gen4_pcie_speed_change(dw); > > > > > > if (dw_pcie_link_up(dw)) { > > > > > > printk("%s:%d\n", __func__, i); > > > > > > return 0; > > > > > > } > > > > > > } > > > > > > --- > > > > > > > > > > My idea was to implement something like this: > > > > > > > > > > +static int rcar_gen4_pcie_start_link(struct dw_pcie *dw) > > > > > +{ > > > > > + struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw); > > > > > + > > > > > + rcar_gen4_pcie_ltssm_enable(rcar, true); > > > > > + > > > > > + rcar_gen4_pcie_speed_change(dw); > > > > > + > > > > > + return 0; > > > > > +} > > > > > > > > > > and retain the rcar_gen4_pcie_link_up() method as is. > > > > > > > > Unfortunately, such a code doesn't work on my environment... > > > > > > > > > * Note: originally your loop used to have the msleep() call performed > > > > > after the first rcar_gen4_pcie_speed_change() invocation. Thus the > > > > > delay can be dropped if there is only one iteration implemented (see > > > > > further to understand why). > > > > > > > > Calling rcar_gen4_pcie_speed_change() multiple times is required on > > > > my environment. I thought msleep(100) was quite long so that I tried > > > > other wait interval like below: > > > > > > > > msleep(1) : about 5 loops is needed for link. (about 5 msec.) > > > > usleep_range(100, 110) : about 400 loops is needed for link. (about 40 msec.) > > > > usleep_range(500, 600) : about 80 loops is needed for link. (about 40 msec.) > > > > > > > > The delay timing doesn't seems important. Both cases below can work correctly. > > > > --- case 1 --- > > > > for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) { > > > > rcar_gen4_pcie_speed_change(dw); > > > > if (dw_pcie_link_up(dw)) { > > > > printk("%s:%d\n", __func__, i); // will be removed > > > > return 0; > > > > } > > > > > > > msleep(1); > > > > > > Why? Just set it to 5 ms. In anyway please see the next message. > > > > > > > } > > > > --- > > > > --- case 2 --- > > > > for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) { > > > > rcar_gen4_pcie_speed_change(dw); > > > > msleep(1); > > > > if (dw_pcie_link_up(dw)) { > > > > printk("%s:%d\n", __func__, i); // will be removed > > > > return 0; > > > > } > > > > } > > > > --- > > > > > > > > So, I'll use case 1 for it. > > > > > > Ah. I think I get it now. Your spreadsheet: > > > > > > LinkCap max-link-speed loop > > > Device A 4 4 1 > > > Device A 4 3 1 > > > Device B 3 3 0 > > > > > > actually meant (loop+1) iterations. So in case of Gen4 you'll need > > > three speed changes (one already enabled in the dw_pcie_setup_rc() > > > method and another two ones are performed in your loop). Similarly in > > > case of Gen3 you'll need only one iteration. I bet you won't need to > > > call rcar_gen4_pcie_speed_change() at all if gen2 needs to be trained. > > > Could you try it out? > > > > > > Anyway based on what you discovered and on my experience working with > > > that controller, there should be as many > > > GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE flag changes as the target speed > > > value, i.e. no flag switch if Gen1 is required, one flag switch if > > > Gen2 is required and so on. Although I failed to find any explicit > > > statement about that in the HW-manual. > > > > > > In addition to the above I've found out that > > > GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE field is actually self cleared when > > > the speed change occurs (see the register description in the HW > > > reference manual). We can use it to implement the > > > dw_pcie_link_up()-independent link training algorithm like this: > > > > > > +#define RCAR_RETRAIN_MAX_CHECK 10 > > > +#define RCAR_LINK_SPEED_MAX 4 > > > + > > > +static bool rcar_gen4_pcie_speed_change(struct dw_pcie *dw) > > > +{ > > > + u32 val; > > > + int i; > > > + > > > + val = dw_pcie_readl_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL); > > > + val &= ~PORT_LOGIC_SPEED_CHANGE; > > > + dw_pcie_writel_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL, val); > > > + > > > + val |= PORT_LOGIC_SPEED_CHANGE; > > > + dw_pcie_writel_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL, val); > > > + > > > + for (i = 0; i < RCAR_SPEED_CHANGE_WAIT_RETRIES; i++) { > > > + val = dw_pcie_readl_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL); > > > + if (!(val & PORT_LOGIC_SPEED_CHANGE)) > > > + return true; > > > + > > > + msleep(1); > > > + } > > > + > > > + return false; > > > +} > > > + > > > +static int rcar_gen4_pcie_start_link(struct dw_pcie *dw) > > > +{ > > > + struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw); > > > + int i, changes; > > > + > > > + rcar_gen4_pcie_ltssm_enable(rcar, true); > > > + > > > > > + changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX); > > > > This should have been: > > +changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX) - 1; > > because Gen1 doesn't need any speed change action. > > > > But this part can be further improved. Instead of the code above the > > next snipped can be implemented: > > > > +changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX) - 1; > > +if (changes && rcar->mode == DW_PCIE_RC_TYPE) > > + changes -= 1; > > > > It takes into account that the GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE > > flag is already set in the dw_pcie_setup_rc() method. So Gen2 will be > > trained with no need in addition actions. If it's supported of course. > > Thank you very much for your comments and suggestion! The suggestion code > works correctly on my environment. Awesome! Glad we've finally settled this. -Serge(y) > > Best regards, > Yoshihiro Shimoda > > P.S. > So, I'm investigating endpoint mode issues which you commented now. > > > -Serge(y) > > > > > + for (i = 0; i < changes; ++i) { > > > + if (!rcar_gen4_pcie_speed_change(dw)) > > > + break; > > > + } > > > + > > > + return 0; > > > +} > > > > > > Note 1. The actual link state will be checked in the framework of the > > > dw_pcie_wait_for_link() function, by means of dw_pcie_link_up(). > > > > > > Note 2. RCAR_LINK_SPEED_MAX is deliberately set to 4 because DW PCIe > > > EP core driver doesn't set the PORT_LOGIC_SPEED_CHANGE flag. In case > > > of the DW PCIe Root Port at most 3 iterations should be enough. > > > > > > Note 3. Please use the RCAR_ prefix for the vendor-specific macros. > > > It concerns the entire series. > > > > > > Could you try out the code suggested above? > > > > > > -Serge(y) > > > > > > > > > > > > You don't need to wait for the link to actually get up in the > > > > > start_link() callback because there is the link_up() callback, which > > > > > is called from the dw_pcie_wait_for_link() method during the generic > > > > > DWC PCIe setup procedure. See: > > > > > > > > Since the procedure will call rcar_gen4_pcie_speed_change() from > > > > ->start_link() once, my environment cannot work correctly... > > > > > > > > Best regards, > > > > Yoshihiro Shimoda > > > > > > > > > dw_pcie_host_init(): > > > > > +-> ops->host_init() > > > > > +-> ... > > > > > +-> dw_pcie_setup_rc() > > > > > | +-> ... > > > > > | +-> dw_pcie_setup() > > > > > | +-> ... > > > > > +-> if !dw_pcie_link_up() > > > > > | | +-> ops->link_up() > > > > > | +-> dw_pcie_start_link() > > > > > | +-> ops->start_link() > > > > > +-> dw_pcie_wait_for_link(); // See, wait-procedure is already performed > > > > > | +-> loop 10 times // for you in the core driver together > > > > > | +-> dw_pcie_link_up() // with the delays between the checks > > > > > | +-> ops->link_up() > > > > > | +-> msleep(~100) > > > > > +-> ... > > > > > > > > > > -Serge(y) > > > > > > > > > > > > > > > > > Best regards, > > > > > > Yoshihiro Shimoda > > > > > > > > > > > > > -Serge(y) > > > > > >