Re: UDP jitter

Nebojša Ćosić <nebojsa@xxxxxxxx> · Wed, 6 Nov 2013 12:57:09 +0100

> Hello Nebojša,
> 
> I have a similar problem now with 3.2.51-rt72. Did
> you find any solution?
> 
> regards, gerhard
> 
> > -----Ursprüngliche Nachricht-----
> > Von: linux-rt-users-owner@xxxxxxxxxxxxxxx 
> > [mailto:linux-rt-users-owner@xxxxxxxxxxxxxxx] Im Auftrag von 
> > Nebojša Cosic
> > Gesendet: Dienstag, 30. April 2013 19:27
> > An: Carsten Emde
> > Cc: linux-rt-users
> > Betreff: Re: UDP jitter
> > 
> > 
> > > Hi Nebojša,
> > Hi Carsten
> > > 
> > > > I am doing some work on a product running kernel 2.6.33.7.2-rt30.
> > > > Applications running on this kernel are a bit specific, 
> > meaning that 
> > > > there are a number of threads running on a different priorities.
> > > > For a several months I was haunted with spurious jitter, 
> > detected on 
> > > > UDP messages - multicast UDP messages where received on 
> > originating 
> > > > node without any delay, but on other nodes a delay in 
> > range of 10s 
> > > > of milliseconds was detected. Simply, it looked like a 
> > message was 
> > > > stuck in kernel before finally getting transmitted.
> > > > Finally, thanks to LTTng tool, I was able to locate the 
> > problem down 
> > > > to this peace of code in net/sched/sch_generic.c:
> > > >
> > > > int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
> > > >                      struct net_device *dev, struct 
> > netdev_queue *txq,
> > > >                      spinlock_t *root_lock) {
> > > >          int ret = NETDEV_TX_BUSY;
> > > >
> > > >          /* And release qdisc */
> > > >          spin_unlock(root_lock);
> > > >
> > > >          HARD_TX_LOCK(dev, txq);
> > > >
> > > >          if (!netif_tx_queue_stopped(txq) && 
> > !netif_tx_queue_frozen(txq))
> > > >                  ret = dev_hard_start_xmit(skb, dev, txq);
> > > >
> > > >
> > > >          HARD_TX_UNLOCK(dev, txq);
> > > >
> > > >          spin_lock(root_lock);
> > > > ...
> > > >
> > > > When transmit queue is empty, thread wanting to send a 
> > message comes 
> > > > directly to sch_direct_xmit, without changing context. It then 
> > > > releases spin lock, and than takes another. So far so good.
> > > > If this starting thread is of lower priority, it can be 
> > preempted by 
> > > > another thread, while still being in dev_hard_start_xmit function 
> > > > This thread will check if HARD_TX_LOCK is taken, and if so, go on 
> > > > and queue its own message.
> > > > If there are enough higher priority tasks, tx can be stalled 
> > > > indefinitely. [..]
> > > Did you increase the priority of the related sirq-net-tx and 
> > > sirq-net-rx kernel threads appropriately? Some more details on 
> > > enabling real-time Ethernet are given here -> 
> > https://www.osadl.org/?id=930.
> > Thanks for the link, I was aware of it.
> > I did try to increase sirq-net-tx and rx, even to get tx 
> > higher than rx (in case incoming traffic was creating 
> > problems), but it didn't make any difference. 
> > I was trying to isolate problem by running iperf, but it 
> > worked perfectly well when run on it's own. No wonder, 
> > because it generates traffic from the same priority, and to 
> > trigger this behaviour, one need traffic from at least two 
> > levels of priority, and a busy CPU (so that low priority 
> > thread can get blocked in driver for a noticeable period of time ).
> > I suppose that running two iperf processes at different 
> > priorities can demonstrate the problem.
> > 
> > > 
> > > 	-Carsten.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe 
> > > linux-rt-users" in the body of a message to 
> > majordomo@xxxxxxxxxxxxxxx 
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> > Nebojša
> > --
> > To unsubscribe from this list: send the line "unsubscribe 
> > linux-rt-users" in the body of a message to 
> > majordomo@xxxxxxxxxxxxxxx More majordomo info at  
> > http://vger.kernel.org/majordomo-info.html
> > --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

You can try with this patch. I am quite sure that same problem persists
on all newer kernels (I am using 2.6.33), but never had a time to create
simple test to prove it.

Index: net/sched/sch_generic.c
===================================================================

--- net/sched/sch_generic.c	(revision 1709)
+++ net/sched/sch_generic.c	(revision 1710)
@@ -120,16 +120,18 @@
 	int ret = NETDEV_TX_BUSY;
 
 	/* And release qdisc */
-	spin_unlock(root_lock);
+/*	spin_unlock(root_lock);
 
 	HARD_TX_LOCK(dev, txq);
+*/
 	if (!netif_tx_queue_stopped(txq)
&& !netif_tx_queue_frozen(txq)) ret = dev_hard_start_xmit(skb, dev,
txq); 
+/*
 	HARD_TX_UNLOCK(dev, txq);
 
 	spin_lock(root_lock);
-
+*/
 	if (dev_xmit_complete(ret)) {
 		/* Driver sent out skb successfully or skb was
consumed */ ret = qdisc_qlen(q);



Another way to work around problem is to use user space daemon (zeromq,
for example) as a network scheduler, and allow communication only from
that daemon (which can have as high priority as you need it).

-- 
Nebojša
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html