Hi again, > From: Yoshihiro Shimoda, Sent: Friday, May 22, 2020 2:58 PM > > Hello Sergei-san, > > > From: Sergei Shtylyov, Sent: Monday, May 18, 2020 5:45 PM > > > > On 18.05.2020 7:54, Dirk Behme wrote: > > > > > Analyzing [1] it seems there is a race condition where ravb_start_xmit() > > > can be called from interrupt while tx skbuffs are being torn down in > > > the scheduled timeout handling. The actual timeout work is done in > > > ravb_tx_timeout_work() during which the tx skbuffs are torn down via > > > invocations of ravb_ring_free(). But there seems to be no flag to tell > > > the driver it is shutting down so it continues to use the ring buffer > > > when it should not. > > > > > > Fix this by disabling the interrupts in the timeout handler. > > > > Hm, given that we stop all TX queues prior to tearing down the buffers, > > it is somewhat strange that you see the driver's send path called... > > But disabling the interrupts seems the Right Thing anyways... > > I didn't reproduce this issue and I didn't know the root cause yet. > But, I'm feeling to strange. Especially, "ravb_start_xmit() can be called from interrupt". > I didn't find where ravb_start_xmit() can be called from interrupt for now. > > By the way, I'm thinking the following message is related to the issue. > > > ravb e6800000.ethernet ethernet: failed to switch device to config mode > > The ravb_config() output the message if failed. And, ravb_tx_timeout_work() > calls ravb_config() via ravb_stop_dma() and ravb_dmac_init(). > --- > ravb_tx_timeout_work() { > ravb_stop_dma() // call ravb_config() > > ravb_ring_free() // *1 > <snip> > ravb_dmac_init() // call ravb_config() > <snip> > } > > ravb_dmac_init() > { > <snip> > error = ravb_config() // "failed to switch device to config modes" here and -ETIMEDOUT > if (error) > return error // *2 > ravb_ring_init() // *3 > <snip> > } > -- > > If ravb_config() failed at the *2, since ravb_ring_init() was not called, > any descriptors were not allocated and then the issue should happen. > Note that according to the datasheet, the controller cannot change the > mode from "Operation" to Configuration" when the controller is doing > TX or RX. I'm afraid but, this scenario seems wrong because if the controller enters "Configuration" once, the driver needs to change the mode to "Operation" for starting any transfer again. > I don't know why the first ravb_config() doesn't fail for now. > But, my scenario is one of functions below enables the TX and RX > by calling ravb_rcv_snd_enable(): > - ravb_emac_interrupt_unlocked() ... if ESCR_LCHNG && !no_avb_link && PSR_LMON > - ravb_adjust_link() ... if no_avb_link && phydev->link > - ravb_set_rx_csum() ... if enabling NETIF_F_RXCSUM > > ravb_tx_timeout_work() calls ravb_stop_dma(). And, ravb_stop_dma() calls > ravb_rcv_snd_disable(). So, we assumed ravb_tx_timeout_work() didn't transfer > anymore. However, if one of the above functions was called, this is possible > to enable TX and RX. > > If this scenario is true, I'm thinking we can fix the issue by using > spin_{un}lock_irq{restore,save}() between ravb_stop_dma() to ravb_dmac_init(). This is also wrong because ravb_ring_init() call kcalloc() with GFP_KERNEL. I'll try to reproduce this issue again in next week... Best regards, Yoshihiro Shimoda > # ravb_ptp_init() calls spin_lock API so that we should not spin lock ravb_ptp_{init,stop}(). > > What do you think? > > I'll try to reproduce this issue on my environment again... > # I'm thinking ravb_adjust_link() is possible to cause this issue. > # But, Salvator-X doesn't have renesas,no-ether-link so that I'll try to add it. > > Best regards, > Yoshihiro Shimoda