On 02/07/2018 07:31 AM, Ivan Khoronzhuk wrote: > On Wed, Feb 07, 2018 at 05:03:19AM +0200, Ivan Khoronzhuk wrote: >> On Tue, Feb 06, 2018 at 07:17:06PM -0600, Grygorii Strashko wrote: >>> It was discovered that simple program which indefinitely sends 200b UDP >>> packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network >>> watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog >>> timeout is triggered due to race between cpsw_ndo_start_xmit() and >>> cpsw_tx_handler() [NAPI] >>> >>> cpsw_ndo_start_xmit() >>> if (unlikely(!cpdma_check_free_tx_desc(txch))) { >>> txq = netdev_get_tx_queue(ndev, q_idx); >>> netif_tx_stop_queue(txq); >>> >>> ^^ as per [1] barrier has to be used after set_bit() otherwise new value >>> might not be visible to other cpus >>> } >>> >>> cpsw_tx_handler() >>> if (unlikely(netif_tx_queue_stopped(txq))) >>> netif_tx_wake_queue(txq); >>> >>> and when it happens ndev TX queue became disabled forever while driver's HW >>> TX queue is empty. >> I'm sure it fixes test case somehow but there is some strangeness. >> (I've thought about this some X months ago): >> 1. If no free desc, then there is bunch of descs on the queue ready to be sent >> 2. If one of this desc while this process was missed then next will wake queue, >> because there is bunch of them on the fly. So, if desc on top of the sent queue >> missed to enable the queue, then next one more likely will enable it anyway.. >> then how it could happen? The described race is possible only on last >> descriptor, yes, packets are small the speed is hight, possibility is very small >> .....but then next situation is also possible: >> - packets are sent fast >> - all packets were sent, but no any descriptors are freed now by sw interrupt (NAPI) >> - when interrupt had started NAPI, the queue was enabled, all other next >> interrupts are throttled once NAPI not finished it's work yet. >> - when new packet submitted, no free descs are present yet (NAPI has not freed >> any yet), but all packets are sent, so no one can awake tx queue, as interrupt >> will not arise when NAPI is started to free first descriptor interrupts are >> disabled.....because h/w queue to be sent is empty... >> - how it can happen as submitting packet and handling packet operations is under >> channel lock? Not exactly, a period between handling and freeing the descriptor >> to the pool is not under channel lock, here: >> >> spin_unlock_irqrestore(&chan->lock, flags); >> if (unlikely(status & CPDMA_DESC_TD_COMPLETE)) >> cb_status = -ENOSYS; >> else >> cb_status = status; >> >> __cpdma_chan_free(chan, desc, outlen, cb_status); >> return status; >> >> unlock_ret: >> spin_unlock_irqrestore(&chan->lock, flags); >> return status; >> >> And: >> __cpdma_chan_free(chan, desc, outlen, cb_status); >> -> cpdma_desc_free(pool, desc, 1); >> >> As result, queue deadlock as you've described. >> Just thought, not checked, but theoretically possible. >> What do you think? Yep. And if this happens desc_alloc_fail should be >0 while i usually see it 0 when watchdog triggers. I was able to reproduce this situation once, but with bunch of debug code added which changes timings: Prerequisite: only one free desc available. cpsw_ndo_start_xmit1 cpsw_tx_poll prepare and send packet ->Free desc queue is empty at this moment if (unlikely(!cpdma_check_free_tx_desc(txch))) netif_tx_stop_queue(txq); --- tx queue stopped cpdma_chan_process() spin_lock_irqsave(&chan->lock, flags); chan->count-- spin_unlock_irqrestore(&chan->lock, flags) cpsw_tx_handler() if (unlikely(netif_tx_queue_stopped(txq))) netif_tx_wake_queue(txq); --- tx queue is woken up cpsw_ndo_start_xmit2 prepare packet ret = cpsw_tx_packet_submit(priv, skb, txch); //fail as desc not returned to the pool yet if (unlikely(ret != 0)) { cpsw_err(priv, tx_err, "desc submit failed\n"); goto fail; } cpdma_desc_free() fail: ndev->stats.tx_dropped++; netif_tx_stop_queue(txq); // oops. That's why I added double check and barrier in fail path also > > Better explanation, for rare race: > > start conditions: > - all descs are submitted, except last one, and queue is not stopped > - any desc was returned to the pool yet (but packets can be sent) > > time > || > \/ > > submit process NAPI poll process > -------------------------------------------------------------------------------- > new packet is scheduled for submission > stated that no free descs (with locks) > lock is freed > returned all descs to the pool > and queue is enabled > interrupt enabled, poll exit > queue is disabled > submit exit > > Result: > - all descs are returned to the pool, submission to the queue disabled > - NAPI cannot wake queue, as all desc were handled already > > According to packet size in 200B > Data size, bits: 200B * 63desc * 10 = 128000bit roughly > Time all of them are sent: 128000 / 1Gb = 128us > > That's enough the CPU to be occupied by other process in RT even. > First, in case of RT, it's not a "rare race" as all processing is done in threads and both xmit and cpsw_tx_poll can be kicked out any time - in case of xmit right before netif_tx_stop_queue(). It also better reproducible with low speed as xmit drains Free desc pool faster. Next, on SMP it not only execution race as netif_tx_stop_queue() is just set_bit(), so new value might not be visible to other cpus and due to netif_tx_queue_stopped(txq) check cpsw_tx_handler() might fail to wake up the queue. By the way, I'm not inventing smth new here - I've checked bunch of net drivers and found that a lot of them have exactly the same netif_tx_stop_queue/netif_tx_wake_queue handling. Do you want me to update description or there are other comments? -- regards, -grygorii -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html