On Tue, 9 Jan 2018 10:58:30 -0800 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > So I really think "you can use up 90% of CPU time with a UDP packet > flood from the same network" is very very very different - and > honestly not at all as important - as "you want to be able to use a > USB DVB receiver and watch/record TV". > > Because that whole "UDP packet flood from the same network" really is > something you _fundamentally_ have other mitigations for. > > I bet that whole commit was introduced because of a benchmark test, > rather than real life. No? I believe this have happened in real-life. In the form of DNS servers not being able to recover after long outage, where DNS-TTL had timeout causing legitimate traffic to overload their DNS servers. The goodput answers/sec from their DNS servers were too low, when bringing them online again. (Based on talk over beer at NetDevConf from a guy claiming they ran DNS for AWS). The commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") tries to address a fundamental problem that the network stack have when interacting with softirq in overload situations. (Maybe we can come up with a better solution?) Before this commit, when application run on same CPU as softirq, the kernel have a bad "drop off cliff" behavior, when reaching above the saturation point. This is confirmed in CloudFlare blogpost[1], which used a kernel that predates this commit. From[1] section: "A note on NUMA performance" Quote:" 1. Run receiver on another CPU, but on the same NUMA node as the RX queue. The performance as we saw above is around 360kpps. 2. With receiver on exactly same CPU as the RX queue we can get up to ~430kpps. But it creates high variability. The performance drops down to zero if the NIC is overwhelmed with packets." The behavior problem here is "performance drops down to zero if the NIC is overwhelmed with packets". That is a bad way to handle overload. Not only when attacked, but also when bringing a service online after an outage. What essentially happens is that: 1. softirq NAPI enqueue 64 packets into socket. 2. application dequeue 1 packet and invoke local_bh_enable() 3. causing softirq to run in app-timeslice, again enq 64 packets 4. app only see goodput of 1/128 of packets That is essentially what Eric solved with his commit, avoiding (3) local_bh_enable() to invoke softirq if ksoftirqd is already running. Maybe we can come up with a better solution? (as I do agree this was a too big-hammer affecting other use-cases) [1] https://blog.cloudflare.com/how-to-receive-a-million-packets/ p.s. Regarding quote[1] point "1.", after Paolo Abeni optimized the UDP code, that statement is no longer true. It now (significantly) faster to run/pin your UDP application to another CPU than the RX-CPU. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer