Re: [PATCH v16 19/22] PCI: rcar-gen4: Add R-Car Gen4 PCIe Host support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 14, 2023 at 02:30:13AM +0000, Yoshihiro Shimoda wrote:
> Hello Serge,
> 
>  From: Serge Semin, Sent: Tuesday, June 13, 2023 4:52 AM
> > 
> > On Mon, Jun 12, 2023 at 01:19:02PM +0000, Yoshihiro Shimoda wrote:
> > > Hello Serge,
> > >
> > > > From: Serge Semin, Sent: Friday, June 9, 2023 7:54 PM
> > > <snip>
> > > > > > > static int rcar_gen4_pcie_start_link(struct dw_pcie *dw)
> > > > > > > {
> > > > > > >         struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
> > > > > > >         int i;
> > > > > > >
> > > > > > >         rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > > > > >
> > > > > > >         /*
> > > > > > >          * Require direct speed change here. Otherwise RDLH_LINK_UP of
> > > > > > >          * PCIEINTSTS0 which is this controller specific register may not
> > > > > > >          * be set.
> > > > > > >          */
> > > > > >
> > > > > > >         if (rcar->needs_speed_change) {
> > > > > >
> > > > > > Seeing this is specified for the root port only what about
> > > > > > replacing the statement with just test whether the rcar_gen4_pcie.mode ==
> > > > > > DW_PCIE_RC_TYPE? Thus you'll be ablt to drop the needs_speed_change field.
> > > > >
> > > > > Thank you for the comment. I'll fix it.
> > > > >
> > > > > > BTW Just curious. Why is the loop below enabled for the Root Port
> > > > > > only? What about the end-point controller? It's the same hardware
> > > > > > after all..
> > > > >
> > > > > This is reused from v16 and then it used "link retraining" which is only for
> > > > > the Root Port. As you mentioned, it seems endpoint controller is also needed
> > > > > if we use direct speed change.
> > > > >
> > > > > > >                 for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > > > > > >                         rcar_gen4_pcie_speed_change(dw);
> > > > > > >                         msleep(100);
> > > > > > >                         if (rcar_gen4_pcie_check_current_link(dw))
> > > > > > >                                 return 0;
> > > > > > >                 }
> > > > > >
> > > > > > Did you trace how many iterations this loop normally takes?
> > > > >
> > > > > i = 0 or 1 (if the max-link-speed is suitable for a connected device.)
> > > > >
> > > > > > Is it
> > > > > > constant or varies for the same platform setup and a connected link
> > > > > > partner? Does the number of iterations depend on the target link speed
> > > > > > specified via the "max-link-speed" property?
> > > > >
> > > >
> > > > > This is not related to the "max-link-speed". It seems to related to
> > > > > a link partner.
> > > > > 		LinkCap	max-link-speed	loop
> > > > > Device A		4	4		1
> > > > > Device A		4	3		1
> > > > > Device B		3	3		0
> > > >
> > > > Great! If so I would have just left a single unconditional
> > > > rcar_gen4_pcie_speed_change() call placed right after the
> > > > rcar_gen4_pcie_ltssm_enable() method with no delays afterwards. These
> > > > methods would have been invoked in the framework of
> > > > dw_pcie_start_link() after which the dw_pcie_wait_for_link() method is
> > > > called with several checks parted with the ~100ms delay. It will make
> > > > sure that at least some link is up with the link state printed to the
> > > > system log. If for some reason the performance degradation happens
> > > > then it will be up to the system administrator to investigate what was
> > > > wrong. Your driver did as much is it could to reach the best link gen.
> > >
> > > IIUC, is your suggestion like the following code?
> > > ---
> > > 	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > 	if (!dw_pcie_wait_for_link(dw)) {
> > > 		rcar_gen4_pcie_speed_change(dw);
> > > 		return 0;
> > > 	}
> > > ---
> > >
> > > Unfortunately, it doesn't work correctly...
> > > The following code can work correctly. The value of i is still 1 on the device A.
> > > What do you think that the following code is acceptable?
> > > ---
> > > 	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > > 	for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> > > 		msleep(100);
> > > 		rcar_gen4_pcie_speed_change(dw);
> > > 		if (dw_pcie_link_up(dw)) {
> > > 			printk("%s:%d\n", __func__, i);
> > > 			return 0;
> > > 		}
> > > 	}
> > > ---
> > 
> > My idea was to implement something like this:
> > 
> > +static int rcar_gen4_pcie_start_link(struct dw_pcie *dw)
> > +{
> > +	struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
> > +
> > +	rcar_gen4_pcie_ltssm_enable(rcar, true);
> > +
> > +	rcar_gen4_pcie_speed_change(dw);
> > +
> > +	return 0;
> > +}
> > 
> > and retain the rcar_gen4_pcie_link_up() method as is.
> 
> Unfortunately, such a code doesn't work on my environment...
> 
> > * Note: originally your loop used to have the msleep() call performed
> > after the first rcar_gen4_pcie_speed_change() invocation. Thus the
> > delay can be dropped if there is only one iteration implemented (see
> > further to understand why).
> 
> Calling rcar_gen4_pcie_speed_change() multiple times is required on
> my environment. I thought msleep(100) was quite long so that I tried
> other wait interval like below:
> 
>  msleep(1) : about 5 loops is needed for link. (about 5 msec.)
>  usleep_range(100, 110) : about 400 loops is needed for link. (about 40 msec.)
>  usleep_range(500, 600) : about 80 loops is needed for link. (about 40 msec.)
> 
> The delay timing doesn't seems important. Both cases below can work correctly.
> --- case 1 ---
> 	for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> 		rcar_gen4_pcie_speed_change(dw);
> 		if (dw_pcie_link_up(dw)) {
> 			printk("%s:%d\n", __func__, i); // will be removed
> 			return 0;
> 		}

> 		msleep(1);

Why? Just set it to 5 ms. In anyway please see the next message.

> 	}
> ---
> --- case 2 ---
> 	for (i = 0; i < SPEED_CHANGE_MAX_RETRIES; i++) {
> 		rcar_gen4_pcie_speed_change(dw);
> 		msleep(1);
> 		if (dw_pcie_link_up(dw)) {
> 			printk("%s:%d\n", __func__, i); // will be removed
> 			return 0;
> 		}
> 	}
> ---
> 
> So, I'll use case 1 for it.

Ah. I think I get it now. Your spreadsheet:

                LinkCap max-link-speed  loop
Device A           4          4           1
Device A           4          3           1
Device B           3          3           0

actually meant (loop+1) iterations. So in case of Gen4 you'll need
three speed changes (one already enabled in the dw_pcie_setup_rc()
method and another two ones are performed in your loop). Similarly in
case of Gen3 you'll need only one iteration. I bet you won't need to
call rcar_gen4_pcie_speed_change() at all if gen2 needs to be trained.
Could you try it out?

Anyway based on what you discovered and on my experience working with
that controller, there should be as many
GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE flag changes as the target speed
value, i.e. no flag switch if Gen1 is required, one flag switch if
Gen2 is required and so on. Although I failed to find any explicit
statement about that in the HW-manual.

In addition to the above I've found out that
GEN2_CTRL_OFF.DIRECT_SPEED_CHANGE field is actually self cleared when
the speed change occurs (see the register description in the HW
reference manual). We can use it to implement the
dw_pcie_link_up()-independent link training algorithm like this:

+#define RCAR_RETRAIN_MAX_CHECK		10
+#define RCAR_LINK_SPEED_MAX		4
+
+static bool rcar_gen4_pcie_speed_change(struct dw_pcie *dw)
+{
+	u32 val;
+	int i;
+
+	val = dw_pcie_readl_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL);
+	val &= ~PORT_LOGIC_SPEED_CHANGE;
+	dw_pcie_writel_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL, val);
+
+	val |= PORT_LOGIC_SPEED_CHANGE;
+	dw_pcie_writel_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL, val);
+
+	for (i = 0; i < RCAR_SPEED_CHANGE_WAIT_RETRIES; i++) {
+		val = dw_pcie_readl_dbi(dw, PCIE_LINK_WIDTH_SPEED_CONTROL);
+		if (!(val & PORT_LOGIC_SPEED_CHANGE))
+			return true;
+
+		msleep(1);
+	}
+
+	return false;
+}
+
+static int rcar_gen4_pcie_start_link(struct dw_pcie *dw)
+{
+	struct rcar_gen4_pcie *rcar = to_rcar_gen4_pcie(dw);
+	int i, changes;
+
+	rcar_gen4_pcie_ltssm_enable(rcar, true);
+
+	changes = min_not_zero(dw->link_gen, RCAR_LINK_SPEED_MAX);
+	for (i = 0; i < changes; ++i) {
+		if (!rcar_gen4_pcie_speed_change(dw))
+			break;
+	}
+
+	return 0;
+}

Note 1. The actual link state will be checked in the framework of the
dw_pcie_wait_for_link() function, by means of dw_pcie_link_up().

Note 2. RCAR_LINK_SPEED_MAX is deliberately set to 4 because DW PCIe
EP core driver doesn't set the PORT_LOGIC_SPEED_CHANGE flag. In case
of the DW PCIe Root Port at most 3 iterations should be enough.

Note 3. Please use the RCAR_ prefix for the vendor-specific macros.
It concerns the entire series.

Could you try out the code suggested above?

-Serge(y)

> 
> > You don't need to wait for the link to actually get up in the
> > start_link() callback because there is the link_up() callback, which
> > is called from the dw_pcie_wait_for_link() method during the generic
> > DWC PCIe setup procedure. See:
> 
> Since the procedure will call rcar_gen4_pcie_speed_change() from
> ->start_link() once, my environment cannot work correctly...
> 
> Best regards,
> Yoshihiro Shimoda
> 
> > dw_pcie_host_init():
> > +-> ops->host_init()
> > +-> ...
> > +-> dw_pcie_setup_rc()
> > |   +-> ...
> > |   +-> dw_pcie_setup()
> > |   +-> ...
> > +-> if !dw_pcie_link_up()
> > |   |   +-> ops->link_up()
> > |   +-> dw_pcie_start_link()
> > |       +-> ops->start_link()
> > +-> dw_pcie_wait_for_link();   // See, wait-procedure is already performed
> > |   +-> loop 10 times          // for you in the core driver together
> > |       +-> dw_pcie_link_up()  // with the delays between the checks
> > |           +-> ops->link_up()
> > |       +-> msleep(~100)
> > +-> ...
> > 
> > -Serge(y)
> > 
> > >
> > > Best regards,
> > > Yoshihiro Shimoda
> > >
> > > > -Serge(y)
> > >



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux