On Wed, 6 Nov 2013, Nebojša Ćosić wrote: > You can try with this patch. I am quite sure that same problem persists > on all newer kernels (I am using 2.6.33), but never had a time to create > simple test to prove it. I'm looking forward to your detailed analysis .... > Index: net/sched/sch_generic.c > =================================================================== > --- net/sched/sch_generic.c (revision 1709) > +++ net/sched/sch_generic.c (revision 1710) > @@ -120,16 +120,18 @@ > int ret = NETDEV_TX_BUSY; > > /* And release qdisc */ > - spin_unlock(root_lock); > +/* spin_unlock(root_lock); > > HARD_TX_LOCK(dev, txq); > +*/ Do you really think this locking scheme is just for fun? qdisk_lock, i.e. root_lock and the netif_tx_lock are mutually exclusive for a reason. Reading the documentation and comments of code is optional, right? > > > When transmit queue is empty, thread wanting to send a message > > > comes directly to sch_direct_xmit, without changing context. It > > > then releases spin lock, and than takes another. So far so good. > > > If this starting thread is of lower priority, it can be > > > preempted by another thread, while still being in > > > dev_hard_start_xmit function This thread will check if > > > HARD_TX_LOCK is taken, and if so, go on and queue its own > > > message. If there are enough higher priority tasks, tx can be > > > stalled indefinitely. [..] Utter nonsense. The code (sch_direct_xmit) you are modifying is called from exactly two places: 1) __dev_xmit_skb 2) __qdisc_run() Which in turn two call sites: 1) __dev_xmit_skb() either directly or via qdisc_run() 2) net_tx_action() which is the NET_TX softirq action None of those calls does a trylock on the xmit lock. So what the heck are you talking about? Thanks, tglx