On 18.3.2019 13:25, Appana Durga Kedareswara Rao wrote: > Hi Anssi Hannula, > >> -----Original Message----- >> From: Anssi Hannula <anssi.hannula@xxxxxxxxxx> >> Sent: Monday, March 18, 2019 4:52 PM >> >> >> On 12.3.2019 13:27, Appana Durga Kedareswara Rao wrote: >>>> On 07. 03. 19 13:52, Anssi Hannula wrote: >>>>> Currently the xilinx_can xcan_chip_start() function, called from >>>>> .ndo_open() and via CAN_MODE_START (bus-off restart), waits for the >>>>> SR register to show the wanted operating state, with a 1 sec timeout. >>>>> >>>>> However, that register bit will only be set once the HW has observed >>>>> 11 consecutive recessive bits (BusIdle) on the bus. >>>>> >>>>> If the bus will not see the 11 bits (e.g. it is stuck dominant), the >>>>> function will timeout and return an error. >>>>> If this was done as part of a scheduled restart from bus-off, the >>>>> interface will stay in bus-off state forever even if the bus >>>>> recovers later. >>>>> If this was done during interface .ndo_open(), the interface will >>>>> stay down. >>>>> >>>>> According to M_CAN and FLEXCAN documentation they also wait for 11 >>>>> consecutive recessive bits, but their drivers do not seem to wait >>>>> for that. >>>>> >>>>> To make the behavior consistent, modify xilinx_can to also not wait >>>>> for the synchronization to complete. >>>>> >>>>> The only way for users to know for sure that the bus has been joined >>>>> successfully is to see successfully received or transmitted frames. >>>>> That does not seem optimal, but it is consistent with other drivers >>>>> and we should have a properly working restart-ms with xilinx_can. >>>>> >>>>> Tested on ZynqMP with Xilinx CAN-FD 1.0. >>>>> >>>>> Signed-off-by: Anssi Hannula <anssi.hannula@xxxxxxxxxx> >>>>> --- >>>>> >>>>> Not really sure if this way is the preferred way forward. >>>>> >>>>> One alternative way would be to keep the wait but log a warning and >>>>> allow the function to succeed? >>>>> >>>>> But I believe something needs to be done, as it seems a reasonable >>>>> expectation that if an interface is brought up with restart-ms, the >>>>> interface should recover to a working state even after periods of >>>>> bus being in whatever invalid state. >>>>> >>>>> >>>>> The HW waiting is described for Xilinx HW in [1] "Register >>>>> Configuration Sequence", for M_CAN in [2] 3.1.1 Software >>>>> Initialization, for flexcan in [3] 34.4.10.1 Freeze mode. >>>>> >>>>> [1] >>>>> >> https://www.xilinx.com/support/documentation/ip_documentation/canfd/v >>>> 2 >>>>> _0/pg223-canfd.pdf [2] >>>>> http://www.bosch- >>>> semiconductors.com/media/ip_modules/pdf_2/m_can/mcan_ >>>>> users_manual_v3212.pdf [3] >>>>> https://www.nxp.com/docs/en/reference-manual/MC56F8458XRM.pdf >>>>> >>>>> >>>>> drivers/net/can/xilinx_can.c | 13 +------------ >>>>> 1 file changed, 1 insertion(+), 12 deletions(-) >>>>> >>>>> diff --git a/drivers/net/can/xilinx_can.c >>>>> b/drivers/net/can/xilinx_can.c index 97d0933d9bd9..13abc9e1a7ed >>>>> 100644 >>>>> --- a/drivers/net/can/xilinx_can.c >>>>> +++ b/drivers/net/can/xilinx_can.c >>>>> @@ -400,9 +400,8 @@ static int xcan_set_bittiming(struct net_device >>>>> *ndev) static int xcan_chip_start(struct net_device *ndev) { >>>>> struct xcan_priv *priv = netdev_priv(ndev); >>>>> - u32 reg_msr, reg_sr_mask; >>>>> + u32 reg_msr; >>>>> int err; >>>>> - unsigned long timeout; >>>>> u32 ier; >>>>> >>>>> /* Check if it is in reset mode */ @@ -428,10 +427,8 @@ static int >>>>> xcan_chip_start(struct net_device *ndev) >>>>> /* Check whether it is loopback mode or normal mode */ >>>>> if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK) { >>>>> reg_msr = XCAN_MSR_LBACK_MASK; >>>>> - reg_sr_mask = XCAN_SR_LBACK_MASK; >>>>> } else { >>>>> reg_msr = 0x0; >>>>> - reg_sr_mask = XCAN_SR_NORMAL_MASK; >>>>> } >>>>> >>>>> /* enable the first extended filter, if any, as cores with >>>>> extended @@ -443,14 +440,6 @@ static int xcan_chip_start(struct >> net_device *ndev) >>>>> priv->write_reg(priv, XCAN_MSR_OFFSET, reg_msr); >>>>> priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_CEN_MASK); >>>>> >>>>> - timeout = jiffies + XCAN_TIMEOUT; >>>>> - while (!(priv->read_reg(priv, XCAN_SR_OFFSET) & reg_sr_mask)) { >>>>> - if (time_after(jiffies, timeout)) { >>>>> - netdev_warn(ndev, >>>>> - "timed out for correct mode\n"); >>>>> - return -ETIMEDOUT; >>>>> - } >>>>> - } >>> Instead of deleting the timeout loop we can check for the bus off state >> here(SR Register -- Error Status fields). >>> If the controller is in bus off state skip this timeout loop... >> Unfortunately the SR and ESR registers read zero in this situation until the 11 >> recessive bits have been seen, so that would not work. > Could you please provide the steps to reproduce this issue?? > Sure. Below are instructions for reproducing this by just using CAN FD cores to disrupt the bus. I believe the instructions will likely work with other Xilinx CAN cores, too (and the secondary CAN interface(s) probably don't need to be Xilinx). Scenario A - Failure to bring interface up with invalid bus state: 1. Connect 2 CAN FD cores to a properly terminated bus with no other participants. 2. Start one interface at 1000k and start sending frames (well, at least try to send). # ip link set dev can2 up type can bitrate 1000000 restart-ms 2000 # cangen can2 -I 42A -p 100 3. Try to enable another interface at 125k - the other interface we started earlier will prevent this interface from seeing the 11 recessive bits: # ip link set dev can1 up type can bitrate 125000 restart-ms 2000 RTNETLINK answers: Connection timed out [ 49.276765] xilinx_can a0002000.canfd can1: timed out for correct mode [ 49.283295] xilinx_can a0002000.canfd can1: xcan_chip_start failed! [ 49.289629] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check. Scenario B - A temporary bus fault causes the interface to never come back up again even with restart-ms: We will trigger a bus-off and have the bus be in the scenario A state when the restart is being done. 1. Connect 3 CAN FD cores to a properly terminated bus with no other participants. 2. Start interfaces: # ip link set dev can1 up type can bitrate 125000 restart-ms 16000 # ip link set dev can2 up type can bitrate 125000 restart-ms 16000 # ip link set dev can3 up type can bitrate 125000 restart-ms 16000 3. Start observing can1: # candump -D -cc can1 4. Verify that bus is OK and a frame can be sent. It should be seen in the above candump output. # cansend can1 "123#11223344" 5. Make can1 enter bus-off. Here we do it by changing the bitrate of the 2 other interfaces and adding traffic # ip link set dev can2 down # ip link set dev can2 up type can bitrate 1000000 restart-ms 2000 # ip link set dev can3 down # ip link set dev can3 up type can bitrate 1000000 restart-ms 2000 # cangen can2 -I 42A -p 100 -g 1 (keep cangen running) and then trying to send a frame from can1: # cansend can1 "123#11223345" The frame will (expectedly) not show up in candump and # ip -d link show can1 should show the state be in BUS-OFF soon after the command. [ 64.777854] xilinx_can a0002000.canfd can1: bus-off 6. Within 16 sec (restart-ms) of the above cansend command, stop can3 to get the bus into a state that prevents can1 restart (i.e. it will not see 11 recessive bits): # ip link set dev can3 down When the restart-ms triggers, you will get: [ 81.864783] xilinx_can a0002000.canfd can1: timed out for correct mode [ 81.871309] xilinx_can a0002000.canfd can1: xcan_chip_start failed! [ 81.877574] xilinx_can a0002000.canfd can1: Error -110 during restart 7. Bring the bus back to a valid state: # ip link set dev can2 down # ip link set dev can2 up type can bitrate 125000 restart-ms 2000 # ip link set dev can3 down # ip link set dev can3 up type can bitrate 125000 restart-ms 2000 But can1 will remain in BUS-OFF forever. Frames cannot be sent from can1 despite the bus actually being OK now and restart-ms being active. In reality the can1 HW is fine and in error-active at this point, just the Linux interface is in BUS-OFF as xcan_chip_start() failed in step 6. > >> Checked on CAN FD 1.0 on ZynqMP. >> >> >>> Regards, >>> Kedar. >>> >>>>> netdev_dbg(ndev, "status:#x%08x\n", >>>>> priv->read_reg(priv, XCAN_SR_OFFSET)); >>>>> >>>>> >>>> Xilinx folks: Please take a look at this RFC. >>>> >>>> Thanks, >>>> Michal >> >> -- >> Anssi Hannula / Bitwise Oy >> +358 503803997 -- Anssi Hannula / Bitwise Oy +358 503803997