On Thu, 14 May 2015, Sebastian Andrzej Siewior wrote: > * Nathan Sullivan | 2015-05-01 10:04:12 [-0500]: > > >Hello all, > > > >We are running 3.14.37-rt on a Xilinx Zynq based board, and have noticed some > >unfortunate behavior with NAPI polling during heavy incoming traffic. Since, > >as I understand it, softirqs are scheduled on the thread that caused them in > >rt, the netowrk RX softirq simply runs over and over on one CPU of the system. > >The network device never re-enables interupts, basically NAPI polling runs > >forever and weight/budget are irrelevant with preempt-rt on. > > > >Since we set IRQ affinity to CPU 0 for everything, this leads to the system > >live-locking and becoming unusable. With full RT preemption off, things are > >fine. In addition, 3.2 kernels with RT are fine as well under heavy net load. > >Is this behavior due to a design tradeoff, or is it a bug? Hmm. That's interesting. There should be no significant change in the handling of the softirq between 3.2 and 3.14. I introduced the 'run softirq' from tail of the network irq thread in 3.0. It might be a different timing behaviour of the network driver or something else. Might be worthwhile to investigate with tracing what the difference is. > The rx-napi softirq runs once the threaded handler is done with the > handler. Comparing with vanilla there a little difference: vanilla > repeats the softirq a number of times and has a time budget. Once it > decides that it runs for too long it moves into softirqd. -RT on the > other hand repeats the softirq processing as long as there is work to > do. Since it is done after the thread handler completes it work it is > done at the priority of the threaded handler - so if your -RT task has > a higher priority it won't be disturbed by it. > > If you put your shell above the threaded handler of the network handler > (or the handler as SCHED_OTHER) then things should be back to normal. > I'm not sure if moving network (after a while of processing) to ksoftirq > is a good idea because we lose the priority then. Yes, it's tricky. The idea was to run the softirq in the context of the interrupt thread, which raised the softirq to avoid the extra overhead of scheduling. And it really gave us a network performance boost vs. the old variant. Though, if the network is faster than the softirq can shuffle away the packets we run into the situation Nathan described. I have no immediate good idea how to deal with that. Offloading the softirq to a different (SCHED_OTHER) thread might work, but it's not pretty. Another variant would be to temporalily lower the priority of the thread which runs the softirq, but that's going to be a mess all over the place. A very simplistic solution which might be not the worst after all would be to record the runtime of the softirq handler and if it raises the softirq again within a very short timespan let the thread sleep for exactly that amount of runtime. Not sure yet. That needs some experimentation, but the latter is probably quick to implement and will handle the above problem gracefully. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html