Hi again and again, > From: Yoshihiro Shimoda, Sent: Friday, May 22, 2020 8:17 PM > > Hi again, > > > From: Yoshihiro Shimoda, Sent: Friday, May 22, 2020 2:58 PM > > > > Hello Sergei-san, > > > > > From: Sergei Shtylyov, Sent: Monday, May 18, 2020 5:45 PM > > > > > > On 18.05.2020 7:54, Dirk Behme wrote: > > > > > > > Analyzing [1] it seems there is a race condition where ravb_start_xmit() > > > > can be called from interrupt while tx skbuffs are being torn down in > > > > the scheduled timeout handling. The actual timeout work is done in > > > > ravb_tx_timeout_work() during which the tx skbuffs are torn down via > > > > invocations of ravb_ring_free(). But there seems to be no flag to tell > > > > the driver it is shutting down so it continues to use the ring buffer > > > > when it should not. > > > > > > > > Fix this by disabling the interrupts in the timeout handler. > > > > > > Hm, given that we stop all TX queues prior to tearing down the buffers, > > > it is somewhat strange that you see the driver's send path called... > > > But disabling the interrupts seems the Right Thing anyways... > > > > I didn't reproduce this issue and I didn't know the root cause yet. > > But, I'm feeling to strange. Especially, "ravb_start_xmit() can be called from interrupt". > > I didn't find where ravb_start_xmit() can be called from interrupt for now. > > > > By the way, I'm thinking the following message is related to the issue. > > > > ravb e6800000.ethernet ethernet: failed to switch device to config mode > > > > The ravb_config() output the message if failed. And, ravb_tx_timeout_work() > > calls ravb_config() via ravb_stop_dma() and ravb_dmac_init(). > > --- > > ravb_tx_timeout_work() { > > ravb_stop_dma() // call ravb_config() > > > > ravb_ring_free() // *1 > > <snip> > > ravb_dmac_init() // call ravb_config() > > <snip> > > } > > > > ravb_dmac_init() > > { > > <snip> > > error = ravb_config() // "failed to switch device to config modes" here and -ETIMEDOUT > > if (error) > > return error // *2 > > ravb_ring_init() // *3 > > <snip> > > } > > -- > > > > If ravb_config() failed at the *2, since ravb_ring_init() was not called, > > any descriptors were not allocated and then the issue should happen. > > Note that according to the datasheet, the controller cannot change the > > mode from "Operation" to Configuration" when the controller is doing > > TX or RX. > > I'm afraid but, this scenario seems wrong because if the controller > enters "Configuration" once, the driver needs to change the mode to > "Operation" for starting any transfer again. I realized that ravb_stop_dma() is possible to return without ravb_rcv_snd_disable() and ravb_config() calling if TCCR or CSR register indicates transmit is started. > > I don't know why the first ravb_config() doesn't fail for now. > > But, my scenario is one of functions below enables the TX and RX > > by calling ravb_rcv_snd_enable(): > > - ravb_emac_interrupt_unlocked() ... if ESCR_LCHNG && !no_avb_link && PSR_LMON > > - ravb_adjust_link() ... if no_avb_link && phydev->link > > - ravb_set_rx_csum() ... if enabling NETIF_F_RXCSUM In such the case (ravb_stop_dma() returns without any message), I think my scenario is still alive without any these functions calling because ravb_rcv_snd_disable() is not called. I'll check the TCCR and CSR registers in next week. Best regards, Yoshihiro Shimoda