Re: [PATCH v16 19/22] PCI: rcar-gen4: Add R-Car Gen4 PCIe Host support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 14, 2023 at 02:39:29PM +0300, Serge Semin wrote:
> On Wed, Jun 14, 2023 at 02:30:13AM +0000, Yoshihiro Shimoda wrote:
> > Hello Serge,
> > 
> >  From: Serge Semin, Sent: Tuesday, June 13, 2023 4:52 AM
> > > 
> > > On Mon, Jun 12, 2023 at 01:19:02PM +0000, Yoshihiro Shimoda wrote:
> > > > Hello Serge,
> > > >
> > > > > From: Serge Semin, Sent: Friday, June 9, 2023 7:54 PM
> > > > <snip>
> > > > > > > > static int rcar_gen4_pcie_start_link(struct dw_pcie *dw)
> > > > > > > > {
> > > > > > > >         struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
> > > > > > > >         int i;
> > > > > > > >
> > > > > > > >         rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > > > > > >
> > > > > > > >         /*
> > > > > > > >          * Require direct speed change here. Otherwise RDLH_LINK_UP of
> > > > > > > >          * PCIEINTSTS0 which is this controller specific register may not
> > > > > > > >          * be set.
> > > > > > > >          */
> > > > > > >
> > > > > > > >         if (rcar->needs_speed_change) {
> > > > > > >
> > > > > > > Seeing this is specified for the root port only what about
> > > > > > > replacing the statement with just test whether the rcar_gen4_pcie.mode ==
> > > > > > > DW_PCIE_RC_TYPE? Thus you'll be ablt to drop the needs_speed_change field.
> > > > > >
> > > > > > Thank you for the comment. I'll fix it.
> > > > > >
> > > > > > > BTW Just curious. Why is the loop below enabled for the Root Port
> > > > > > > only? What about the end-point controller? It's the same hardware
> > > > > > > after all..
> > > > > >
> > > > > > This is reused from v16 and then it used "link retraining" which is only for
> > > > > > the Root Port. As you mentioned, it seems endpoint controller is also needed
> > > > > > if we use direct speed change.
> > > > > >
> > > > > > > >                 for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > > > > > > >                         rcar_gen4_pcie_speed_change(dw);
> > > > > > > >                         msleep(100);
> > > > > > > >                         if (rcar_gen4_pcie_check_current_link(dw))
> > > > > > > >                                 return 0;
> > > > > > > >                 }
> > > > > > >
> > > > > > > Did you trace how many iterations this loop normally takes?
> > > > > >
> > > > > > i = 0 or 1 (if the max-link-speed is suitable for a connected device.)
> > > > > >
> > > > > > > Is it
> > > > > > > constant or varies for the same platform setup and a connected link
> > > > > > > partner? Does the number of iterations depend on the target link speed
> > > > > > > specified via the "max-link-speed" property?
> > > > > >
> > > > >
> > > > > > This is not related to the "max-link-speed". It seems to related to
> > > > > > a link partner.
> > > > > > 		LinkCap	max-link-speed	loop
> > > > > > Device A		4	4		1
> > > > > > Device A		4	3		1
> > > > > > Device B		3	3		0
> > > > >
> > > > > Great! If so I would have just left a single unconditional
> > > > > rcar_gen4_pcie_speed_change() call placed right after the
> > > > > rcar_gen4_pcie_ltssm_enable() method with no delays afterwards. These
> > > > > methods would have been invoked in the framework of
> > > > > dw_pcie_start_link() after which the dw_pcie_wait_for_link() method is
> > > > > called with several checks parted with the ~100ms delay. It will make
> > > > > sure that at least some link is up with the link state printed to the
> > > > > system log. If for some reason the performance degradation happens
> > > > > then it will be up to the system administrator to investigate what was
> > > > > wrong. Your driver did as much is it could to reach the best link gen.
> > > >
> > > > IIUC, is your suggestion like the following code?
> > > > ---
> > > > 	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > > 	if (!dw_pcie_wait_for_link(dw)) {
> > > > 		rcar_gen4_pcie_speed_change(dw);
> > > > 		return 0;
> > > > 	}
> > > > ---
> > > >
> > > > Unfortunately, it doesn't work correctly...
> > > > The following code can work correctly. The value of i is still 1 on the device A.
> > > > What do you think that the following code is acceptable?
> > > > ---
> > > > 	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > > 	for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > > > 		msleep(100);
> > > > 		rcar_gen4_pcie_speed_change(dw);
> > > > 		if (dw_pcie_link_up(dw)) {
> > > > 			printk("%s:%d\n", __func__, i);
> > > > 			return 0;
> > > > 		}
> > > > 	}
> > > > ---
> > > 
> > > My idea was to implement something like this:
> > > 
> > > +static int rcar_gen4_pcie_start_link(struct dw_pcie *dw)
> > > +{
> > > +	struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
> > > +
> > > +	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > +
> > > +	rcar_gen4_pcie_speed_change(dw);
> > > +
> > > +	return 0;
> > > +}
> > > 
> > > and retain the rcar_gen4_pcie_link_up() method as is.
> > 
> > Unfortunately, such a code doesn't work on my environment...
> > 
> > > * Note: originally your loop used to have the msleep() call performed
> > > after the first rcar_gen4_pcie_speed_change() invocation. Thus the
> > > delay can be dropped if there is only one iteration implemented (see
> > > further to understand why).
> > 
> > Calling rcar_gen4_pcie_speed_change() multiple times is required on
> > my environment. I thought msleep(100) was quite long so that I tried
> > other wait interval like below:
> > 
> >  msleep(1) : about 5 loops is needed for link. (about 5 msec.)
> >  usleep_range(100, 110) : about 400 loops is needed for link. (about 40 msec.)
> >  usleep_range(500, 600) : about 80 loops is needed for link. (about 40 msec.)
> > 
> > The delay timing doesn't seems important. Both cases below can work correctly.
> > --- case 1 ---
> > 	for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > 		rcar_gen4_pcie_speed_change(dw);
> > 		if (dw_pcie_link_up(dw)) {
> > 			printk("%s:%d\n", __func__, i); // will be removed
> > 			return 0;
> > 		}
> 
> > 		msleep(1);
> 
> Why? Just set it to 5 ms. In anyway please see the next message.
> 
> > 	}
> > ---
> > --- case 2 ---
> > 	for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > 		rcar_gen4_pcie_speed_change(dw);
> > 		msleep(1);
> > 		if (dw_pcie_link_up(dw)) {
> > 			printk("%s:%d\n", __func__, i); // will be removed
> > 			return 0;
> > 		}
> > 	}
> > ---
> > 
> > So, I'll use case 1 for it.
> 
> Ah. I think I get it now. Your spreadsheet:
> 
>                 LinkCap max-link-speed  loop
> Device A           4          4           1
> Device A           4          3           1
> Device B           3          3           0
> 
> actually meant (loop+1) iterations. So in case of Gen4 you'll need
> three speed changes (one already enabled in the dw_pcie_setup_rc()
> method and another two ones are performed in your loop). Similarly in
> case of Gen3 you'll need only one iteration. I bet you won't need to
> call rcar_gen4_pcie_speed_change() at all if gen2 needs to be trained.
> Could you try it out?
> 
> Anyway based on what you discovered and on my experience working with
> that controller, there should be as many
> GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE flag changes as the target speed
> value, i.e. no flag switch if Gen1 is required, one flag switch if
> Gen2 is required and so on. Although I failed to find any explicit
> statement about that in the HW-manual.
> 
> In addition to the above I've found out that
> GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE field is actually self cleared when
> the speed change occurs (see the register description in the HW
> reference manual). We can use it to implement the
> dw_pcie_link_up()-independent link training algorithm like this:
> 
> +#define RCAR_RETRAIN_MAX_CHECK		10
> +#define RCAR_LINK_SPEED_MAX		4
> +
> +static bool rcar_gen4_pcie_speed_change(struct dw_pcie *dw)
> +{
> +	u32 val;
> +	int i;
> +
> +	val = dw_pcie_readl_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL);
> +	val &= ~PORT_LOGIC_SPEED_CHANGE;
> +	dw_pcie_writel_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL, val);
> +
> +	val |= PORT_LOGIC_SPEED_CHANGE;
> +	dw_pcie_writel_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL, val);
> +
> +	for (i = 0; i < RCAR_SPEED_CHANGE_WAIT_RETRIES; i++) {
> +		val = dw_pcie_readl_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL);
> +		if (!(val & PORT_LOGIC_SPEED_CHANGE))
> +			return true;
> +
> +		msleep(1);
> +	}
> +
> +	return false;
> +}
> +
> +static int rcar_gen4_pcie_start_link(struct dw_pcie *dw)
> +{
> +	struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
> +	int i, changes;
> +
> +	rcar_gen4_pcie_ltssm_enable(rcar, true);
> +

> +	changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX);

This should have been:
+changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX) - 1;
because Gen1 doesn't need any speed change action.

But this part can be further improved. Instead of the code above the
next snipped can be implemented:

+changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX) - 1;
+if (changes && rcar->mode == DW_PCIE_RC_TYPE)
+		changes -= 1;

It takes into account that the GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE
flag is already set in the dw_pcie_setup_rc() method. So Gen2 will be
trained with no need in addition actions. If it's supported of course.

-Serge(y)

> +	for (i = 0; i < changes; ++i) {
> +		if (!rcar_gen4_pcie_speed_change(dw))
> +			break;
> +	}
> +
> +	return 0;
> +}
> 
> Note 1. The actual link state will be checked in the framework of the
> dw_pcie_wait_for_link() function, by means of dw_pcie_link_up().
> 
> Note 2. RCAR_LINK_SPEED_MAX is deliberately set to 4 because DW PCIe
> EP core driver doesn't set the PORT_LOGIC_SPEED_CHANGE flag. In case
> of the DW PCIe Root Port at most 3 iterations should be enough.
> 
> Note 3. Please use the RCAR_ prefix for the vendor-specific macros.
> It concerns the entire series.
> 
> Could you try out the code suggested above?
> 
> -Serge(y)
> 
> > 
> > > You don't need to wait for the link to actually get up in the
> > > start_link() callback because there is the link_up() callback, which
> > > is called from the dw_pcie_wait_for_link() method during the generic
> > > DWC PCIe setup procedure. See:
> > 
> > Since the procedure will call rcar_gen4_pcie_speed_change() from
> > ->start_link() once, my environment cannot work correctly...
> > 
> > Best regards,
> > Yoshihiro Shimoda
> > 
> > > dw_pcie_host_init():
> > > +-> ops->host_init()
> > > +-> ...
> > > +-> dw_pcie_setup_rc()
> > > |   +-> ...
> > > |   +-> dw_pcie_setup()
> > > |   +-> ...
> > > +-> if !dw_pcie_link_up()
> > > |   |   +-> ops->link_up()
> > > |   +-> dw_pcie_start_link()
> > > |       +-> ops->start_link()
> > > +-> dw_pcie_wait_for_link();   // See, wait-procedure is already performed
> > > |   +-> loop 10 times          // for you in the core driver together
> > > |       +-> dw_pcie_link_up()  // with the delays between the checks
> > > |           +-> ops->link_up()
> > > |       +-> msleep(~100)
> > > +-> ...
> > > 
> > > -Serge(y)
> > > 
> > > >
> > > > Best regards,
> > > > Yoshihiro Shimoda
> > > >
> > > > > -Serge(y)
> > > >



[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux