UDP jitter

Nebojša Ćosić <nebojsa@xxxxxxxx> · Mon, 29 Apr 2013 22:22:38 +0200

Hi,
I am doing some work on a product running kernel 2.6.33.7.2-rt30.
Applications running on this kernel are a bit specific, meaning that
there are a number of threads running on a different priorities.
For a several months I was haunted with spurious jitter, detected on
UDP messages - multicast UDP messages where received on originating
node without any delay, but on other nodes a delay in range of 10s of
milliseconds was detected. Simply, it looked like a message was stuck
in kernel before finally getting transmitted.
Finally, thanks to LTTng tool, I was able to locate the problem down to
this peace of code in net/sched/sch_generic.c:

int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
                    struct net_device *dev, struct netdev_queue *txq,
                    spinlock_t *root_lock)
{
        int ret = NETDEV_TX_BUSY;

        /* And release qdisc */
        spin_unlock(root_lock);

        HARD_TX_LOCK(dev, txq);

        if (!netif_tx_queue_stopped(txq) && !netif_tx_queue_frozen(txq))
                ret = dev_hard_start_xmit(skb, dev, txq);

        HARD_TX_UNLOCK(dev, txq);

        spin_lock(root_lock);
...

When transmit queue is empty, thread wanting to send a message comes
directly to sch_direct_xmit, without changing context. It then releases
spin lock, and than takes another. So far so good.
If this starting thread is of lower priority, it can be preempted by
another thread, while still being in dev_hard_start_xmit function
This thread will check if HARD_TX_LOCK is taken, and if so, go on and
queue its own message.
If there are enough higher priority tasks, tx can be stalled
indefinitely... Effectively, there is priority inversion.
My temporary workaround was to simply remove lock handling from
sch_direct_xmit, getting priority inheritance on root_lock, and
effectively disabling queuing... Not very good solution, but it works
perfectly well for my particular application.
I see that the same code still exists in latest kernels, so I suppose
that problem exists even there. It is RT specific thou. 
What would be a proper solution to this problem?
Sending first message without switching context is a nice
optimization, but it creates problems...

-- 
Regards
Nebojša
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html