Re: [PATCH v16 19/22] PCI: rcar-gen4: Add R-Car Gen4 PCIe Host support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 20, 2023 at 12:02:08PM +0000, Yoshihiro Shimoda wrote:
> Hello Serge,
> 
> > From: Serge Semin, Sent: Thursday, June 15, 2023 4:32 AM
> > 
> > On Wed, Jun 14, 2023 at 02:39:29PM +0300, Serge Semin wrote:
> > > On Wed, Jun 14, 2023 at 02:30:13AM +0000, Yoshihiro Shimoda wrote:
> > > > Hello Serge,
> > > >
> > > >  From: Serge Semin, Sent: Tuesday, June 13, 2023 4:52 AM
> > > > >
> > > > > On Mon, Jun 12, 2023 at 01:19:02PM +0000, Yoshihiro Shimoda wrote:
> > > > > > Hello Serge,
> > > > > >
> > > > > > > From: Serge Semin, Sent: Friday, June 9, 2023 7:54 PM
> > > > > > <snip>
> > > > > > > > > > static int rcar_gen4_pcie_start_link(struct dw_pcie *dw)
> > > > > > > > > > {
> > > > > > > > > >         struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
> > > > > > > > > >         int i;
> > > > > > > > > >
> > > > > > > > > >         rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > > > > > > > >
> > > > > > > > > >         /*
> > > > > > > > > >          * Require direct speed change here. Otherwise RDLH_LINK_UP of
> > > > > > > > > >          * PCIEINTSTS0 which is this controller specific register may not
> > > > > > > > > >          * be set.
> > > > > > > > > >          */
> > > > > > > > >
> > > > > > > > > >         if (rcar->needs_speed_change) {
> > > > > > > > >
> > > > > > > > > Seeing this is specified for the root port only what about
> > > > > > > > > replacing the statement with just test whether the rcar_gen4_pcie.mode ==
> > > > > > > > > DW_PCIE_RC_TYPE? Thus you'll be ablt to drop the needs_speed_change field.
> > > > > > > >
> > > > > > > > Thank you for the comment. I'll fix it.
> > > > > > > >
> > > > > > > > > BTW Just curious. Why is the loop below enabled for the Root Port
> > > > > > > > > only? What about the end-point controller? It's the same hardware
> > > > > > > > > after all..
> > > > > > > >
> > > > > > > > This is reused from v16 and then it used "link retraining" which is only for
> > > > > > > > the Root Port. As you mentioned, it seems endpoint controller is also needed
> > > > > > > > if we use direct speed change.
> > > > > > > >
> > > > > > > > > >                 for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > > > > > > > > >                         rcar_gen4_pcie_speed_change(dw);
> > > > > > > > > >                         msleep(100);
> > > > > > > > > >                         if (rcar_gen4_pcie_check_current_link(dw))
> > > > > > > > > >                                 return 0;
> > > > > > > > > >                 }
> > > > > > > > >
> > > > > > > > > Did you trace how many iterations this loop normally takes?
> > > > > > > >
> > > > > > > > i = 0 or 1 (if the max-link-speed is suitable for a connected device.)
> > > > > > > >
> > > > > > > > > Is it
> > > > > > > > > constant or varies for the same platform setup and a connected link
> > > > > > > > > partner? Does the number of iterations depend on the target link speed
> > > > > > > > > specified via the "max-link-speed" property?
> > > > > > > >
> > > > > > >
> > > > > > > > This is not related to the "max-link-speed". It seems to related to
> > > > > > > > a link partner.
> > > > > > > > 		LinkCap	max-link-speed	loop
> > > > > > > > Device A		4	4		1
> > > > > > > > Device A		4	3		1
> > > > > > > > Device B		3	3		0
> > > > > > >
> > > > > > > Great! If so I would have just left a single unconditional
> > > > > > > rcar_gen4_pcie_speed_change() call placed right after the
> > > > > > > rcar_gen4_pcie_ltssm_enable() method with no delays afterwards. These
> > > > > > > methods would have been invoked in the framework of
> > > > > > > dw_pcie_start_link() after which the dw_pcie_wait_for_link() method is
> > > > > > > called with several checks parted with the ~100ms delay. It will make
> > > > > > > sure that at least some link is up with the link state printed to the
> > > > > > > system log. If for some reason the performance degradation happens
> > > > > > > then it will be up to the system administrator to investigate what was
> > > > > > > wrong. Your driver did as much is it could to reach the best link gen.
> > > > > >
> > > > > > IIUC, is your suggestion like the following code?
> > > > > > ---
> > > > > > 	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > > > > 	if (!dw_pcie_wait_for_link(dw)) {
> > > > > > 		rcar_gen4_pcie_speed_change(dw);
> > > > > > 		return 0;
> > > > > > 	}
> > > > > > ---
> > > > > >
> > > > > > Unfortunately, it doesn't work correctly...
> > > > > > The following code can work correctly. The value of i is still 1 on the device A.
> > > > > > What do you think that the following code is acceptable?
> > > > > > ---
> > > > > > 	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > > > > 	for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > > > > > 		msleep(100);
> > > > > > 		rcar_gen4_pcie_speed_change(dw);
> > > > > > 		if (dw_pcie_link_up(dw)) {
> > > > > > 			printk("%s:%d\n", __func__, i);
> > > > > > 			return 0;
> > > > > > 		}
> > > > > > 	}
> > > > > > ---
> > > > >
> > > > > My idea was to implement something like this:
> > > > >
> > > > > +static int rcar_gen4_pcie_start_link(struct dw_pcie *dw)
> > > > > +{
> > > > > +	struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
> > > > > +
> > > > > +	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > > > +
> > > > > +	rcar_gen4_pcie_speed_change(dw);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > >
> > > > > and retain the rcar_gen4_pcie_link_up() method as is.
> > > >
> > > > Unfortunately, such a code doesn't work on my environment...
> > > >
> > > > > * Note: originally your loop used to have the msleep() call performed
> > > > > after the first rcar_gen4_pcie_speed_change() invocation. Thus the
> > > > > delay can be dropped if there is only one iteration implemented (see
> > > > > further to understand why).
> > > >
> > > > Calling rcar_gen4_pcie_speed_change() multiple times is required on
> > > > my environment. I thought msleep(100) was quite long so that I tried
> > > > other wait interval like below:
> > > >
> > > >  msleep(1) : about 5 loops is needed for link. (about 5 msec.)
> > > >  usleep_range(100, 110) : about 400 loops is needed for link. (about 40 msec.)
> > > >  usleep_range(500, 600) : about 80 loops is needed for link. (about 40 msec.)
> > > >
> > > > The delay timing doesn't seems important. Both cases below can work correctly.
> > > > --- case 1 ---
> > > > 	for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > > > 		rcar_gen4_pcie_speed_change(dw);
> > > > 		if (dw_pcie_link_up(dw)) {
> > > > 			printk("%s:%d\n", __func__, i); // will be removed
> > > > 			return 0;
> > > > 		}
> > >
> > > > 		msleep(1);
> > >
> > > Why? Just set it to 5 ms. In anyway please see the next message.
> > >
> > > > 	}
> > > > ---
> > > > --- case 2 ---
> > > > 	for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > > > 		rcar_gen4_pcie_speed_change(dw);
> > > > 		msleep(1);
> > > > 		if (dw_pcie_link_up(dw)) {
> > > > 			printk("%s:%d\n", __func__, i); // will be removed
> > > > 			return 0;
> > > > 		}
> > > > 	}
> > > > ---
> > > >
> > > > So, I'll use case 1 for it.
> > >
> > > Ah. I think I get it now. Your spreadsheet:
> > >
> > >                 LinkCap max-link-speed  loop
> > > Device A           4          4           1
> > > Device A           4          3           1
> > > Device B           3          3           0
> > >
> > > actually meant (loop+1) iterations. So in case of Gen4 you'll need
> > > three speed changes (one already enabled in the dw_pcie_setup_rc()
> > > method and another two ones are performed in your loop). Similarly in
> > > case of Gen3 you'll need only one iteration. I bet you won't need to
> > > call rcar_gen4_pcie_speed_change() at all if gen2 needs to be trained.
> > > Could you try it out?
> > >
> > > Anyway based on what you discovered and on my experience working with
> > > that controller, there should be as many
> > > GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE flag changes as the target speed
> > > value, i.e. no flag switch if Gen1 is required, one flag switch if
> > > Gen2 is required and so on. Although I failed to find any explicit
> > > statement about that in the HW-manual.
> > >
> > > In addition to the above I've found out that
> > > GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE field is actually self cleared when
> > > the speed change occurs (see the register description in the HW
> > > reference manual). We can use it to implement the
> > > dw_pcie_link_up()-independent link training algorithm like this:
> > >
> > > +#define RCAR_RETRAIN_MAX_CHECK		10
> > > +#define RCAR_LINK_SPEED_MAX		4
> > > +
> > > +static bool rcar_gen4_pcie_speed_change(struct dw_pcie *dw)
> > > +{
> > > +	u32 val;
> > > +	int i;
> > > +
> > > +	val = dw_pcie_readl_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL);
> > > +	val &= ~PORT_LOGIC_SPEED_CHANGE;
> > > +	dw_pcie_writel_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL, val);
> > > +
> > > +	val |= PORT_LOGIC_SPEED_CHANGE;
> > > +	dw_pcie_writel_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL, val);
> > > +
> > > +	for (i = 0; i < RCAR_SPEED_CHANGE_WAIT_RETRIES; i++) {
> > > +		val = dw_pcie_readl_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL);
> > > +		if (!(val & PORT_LOGIC_SPEED_CHANGE))
> > > +			return true;
> > > +
> > > +		msleep(1);
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +static int rcar_gen4_pcie_start_link(struct dw_pcie *dw)
> > > +{
> > > +	struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
> > > +	int i, changes;
> > > +
> > > +	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > +
> > 
> > > +	changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX);
> > 
> > This should have been:
> > +changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX) - 1;
> > because Gen1 doesn't need any speed change action.
> > 
> > But this part can be further improved. Instead of the code above the
> > next snipped can be implemented:
> > 
> > +changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX) - 1;
> > +if (changes && rcar->mode == DW_PCIE_RC_TYPE)
> > +		changes -= 1;
> > 
> > It takes into account that the GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE
> > flag is already set in the dw_pcie_setup_rc() method. So Gen2 will be
> > trained with no need in addition actions. If it's supported of course.
> 

> Thank you very much for your comments and suggestion! The suggestion code
> works correctly on my environment.

Awesome! Glad we've finally settled this.

-Serge(y)

> 
> Best regards,
> Yoshihiro Shimoda
> 
> P.S.
> So, I'm investigating endpoint mode issues which you commented now.
> 
> > -Serge(y)
> > 
> > > +	for (i = 0; i < changes; ++i) {
> > > +		if (!rcar_gen4_pcie_speed_change(dw))
> > > +			break;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > >
> > > Note 1. The actual link state will be checked in the framework of the
> > > dw_pcie_wait_for_link() function, by means of dw_pcie_link_up().
> > >
> > > Note 2. RCAR_LINK_SPEED_MAX is deliberately set to 4 because DW PCIe
> > > EP core driver doesn't set the PORT_LOGIC_SPEED_CHANGE flag. In case
> > > of the DW PCIe Root Port at most 3 iterations should be enough.
> > >
> > > Note 3. Please use the RCAR_ prefix for the vendor-specific macros.
> > > It concerns the entire series.
> > >
> > > Could you try out the code suggested above?
> > >
> > > -Serge(y)
> > >
> > > >
> > > > > You don't need to wait for the link to actually get up in the
> > > > > start_link() callback because there is the link_up() callback, which
> > > > > is called from the dw_pcie_wait_for_link() method during the generic
> > > > > DWC PCIe setup procedure. See:
> > > >
> > > > Since the procedure will call rcar_gen4_pcie_speed_change() from
> > > > ->start_link() once, my environment cannot work correctly...
> > > >
> > > > Best regards,
> > > > Yoshihiro Shimoda
> > > >
> > > > > dw_pcie_host_init():
> > > > > +-> ops->host_init()
> > > > > +-> ...
> > > > > +-> dw_pcie_setup_rc()
> > > > > |   +-> ...
> > > > > |   +-> dw_pcie_setup()
> > > > > |   +-> ...
> > > > > +-> if !dw_pcie_link_up()
> > > > > |   |   +-> ops->link_up()
> > > > > |   +-> dw_pcie_start_link()
> > > > > |       +-> ops->start_link()
> > > > > +-> dw_pcie_wait_for_link();   // See, wait-procedure is already performed
> > > > > |   +-> loop 10 times          // for you in the core driver together
> > > > > |       +-> dw_pcie_link_up()  // with the delays between the checks
> > > > > |           +-> ops->link_up()
> > > > > |       +-> msleep(~100)
> > > > > +-> ...
> > > > >
> > > > > -Serge(y)
> > > > >
> > > > > >
> > > > > > Best regards,
> > > > > > Yoshihiro Shimoda
> > > > > >
> > > > > > > -Serge(y)
> > > > > >



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux