Hi Anssi Hannula, > -----Original Message----- > From: Anssi Hannula <anssi.hannula@xxxxxxxxxx> > Sent: Monday, March 18, 2019 4:52 PM > To: Appana Durga Kedareswara Rao <appanad@xxxxxxxxxx>; Michal Simek > <michals@xxxxxxxxxx>; linux-can@xxxxxxxxxxxxxxx; Marc Kleine-Budde > <mkl@xxxxxxxxxxxxxx> > Cc: Wolfgang Grandegger <wg@xxxxxxxxxxxxxx>; Shubhrajyoti Datta > <shubhraj@xxxxxxxxxx>; Naga Sureshkumar Relli <nagasure@xxxxxxxxxx> > Subject: Re: [RFC PATCH] can: xilinx_can: fix chip_start failure with invalid bus > > On 12.3.2019 13:27, Appana Durga Kedareswara Rao wrote: > > Hi Anssi Hannula, > > > > Thanks for the patch... > > Please find response inline. > > Hi, and sorry for the delay in response. > > > <Snip> > >> On 07. 03. 19 13:52, Anssi Hannula wrote: > >>> Currently the xilinx_can xcan_chip_start() function, called from > >>> .ndo_open() and via CAN_MODE_START (bus-off restart), waits for the > >>> SR register to show the wanted operating state, with a 1 sec timeout. > >>> > >>> However, that register bit will only be set once the HW has observed > >>> 11 consecutive recessive bits (BusIdle) on the bus. > >>> > >>> If the bus will not see the 11 bits (e.g. it is stuck dominant), the > >>> function will timeout and return an error. > >>> If this was done as part of a scheduled restart from bus-off, the > >>> interface will stay in bus-off state forever even if the bus > >>> recovers later. > >>> If this was done during interface .ndo_open(), the interface will > >>> stay down. > >>> > >>> According to M_CAN and FLEXCAN documentation they also wait for 11 > >>> consecutive recessive bits, but their drivers do not seem to wait > >>> for that. > >>> > >>> To make the behavior consistent, modify xilinx_can to also not wait > >>> for the synchronization to complete. > >>> > >>> The only way for users to know for sure that the bus has been joined > >>> successfully is to see successfully received or transmitted frames. > >>> That does not seem optimal, but it is consistent with other drivers > >>> and we should have a properly working restart-ms with xilinx_can. > >>> > >>> Tested on ZynqMP with Xilinx CAN-FD 1.0. > >>> > >>> Signed-off-by: Anssi Hannula <anssi.hannula@xxxxxxxxxx> > >>> --- > >>> > >>> Not really sure if this way is the preferred way forward. > >>> > >>> One alternative way would be to keep the wait but log a warning and > >>> allow the function to succeed? > >>> > >>> But I believe something needs to be done, as it seems a reasonable > >>> expectation that if an interface is brought up with restart-ms, the > >>> interface should recover to a working state even after periods of > >>> bus being in whatever invalid state. > >>> > >>> > >>> The HW waiting is described for Xilinx HW in [1] "Register > >>> Configuration Sequence", for M_CAN in [2] 3.1.1 Software > >>> Initialization, for flexcan in [3] 34.4.10.1 Freeze mode. > >>> > >>> [1] > >>> > >> > https://www.xilinx.com/support/documentation/ip_documentation/canfd/v > >> 2 > >>> _0/pg223-canfd.pdf [2] > >>> http://www.bosch- > >> semiconductors.com/media/ip_modules/pdf_2/m_can/mcan_ > >>> users_manual_v3212.pdf [3] > >>> https://www.nxp.com/docs/en/reference-manual/MC56F8458XRM.pdf > >>> > >>> > >>> drivers/net/can/xilinx_can.c | 13 +------------ > >>> 1 file changed, 1 insertion(+), 12 deletions(-) > >>> > >>> diff --git a/drivers/net/can/xilinx_can.c > >>> b/drivers/net/can/xilinx_can.c index 97d0933d9bd9..13abc9e1a7ed > >>> 100644 > >>> --- a/drivers/net/can/xilinx_can.c > >>> +++ b/drivers/net/can/xilinx_can.c > >>> @@ -400,9 +400,8 @@ static int xcan_set_bittiming(struct net_device > >>> *ndev) static int xcan_chip_start(struct net_device *ndev) { > >>> struct xcan_priv *priv = netdev_priv(ndev); > >>> - u32 reg_msr, reg_sr_mask; > >>> + u32 reg_msr; > >>> int err; > >>> - unsigned long timeout; > >>> u32 ier; > >>> > >>> /* Check if it is in reset mode */ @@ -428,10 +427,8 @@ static int > >>> xcan_chip_start(struct net_device *ndev) > >>> /* Check whether it is loopback mode or normal mode */ > >>> if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK) { > >>> reg_msr = XCAN_MSR_LBACK_MASK; > >>> - reg_sr_mask = XCAN_SR_LBACK_MASK; > >>> } else { > >>> reg_msr = 0x0; > >>> - reg_sr_mask = XCAN_SR_NORMAL_MASK; > >>> } > >>> > >>> /* enable the first extended filter, if any, as cores with > >>> extended @@ -443,14 +440,6 @@ static int xcan_chip_start(struct > net_device *ndev) > >>> priv->write_reg(priv, XCAN_MSR_OFFSET, reg_msr); > >>> priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_CEN_MASK); > >>> > >>> - timeout = jiffies + XCAN_TIMEOUT; > >>> - while (!(priv->read_reg(priv, XCAN_SR_OFFSET) & reg_sr_mask)) { > >>> - if (time_after(jiffies, timeout)) { > >>> - netdev_warn(ndev, > >>> - "timed out for correct mode\n"); > >>> - return -ETIMEDOUT; > >>> - } > >>> - } > > Instead of deleting the timeout loop we can check for the bus off state > here(SR Register -- Error Status fields). > > If the controller is in bus off state skip this timeout loop... > > Unfortunately the SR and ESR registers read zero in this situation until the 11 > recessive bits have been seen, so that would not work. Could you please provide the steps to reproduce this issue?? Regards, Kedar. > > Checked on CAN FD 1.0 on ZynqMP. > > > > Regards, > > Kedar. > > > >>> netdev_dbg(ndev, "status:#x%08x\n", > >>> priv->read_reg(priv, XCAN_SR_OFFSET)); > >>> > >>> > >> Xilinx folks: Please take a look at this RFC. > >> > >> Thanks, > >> Michal > > > -- > Anssi Hannula / Bitwise Oy > +358 503803997