> >> Long, does this patch make any difference? > > > > Sagi, > > > > Sorry it took a while to bring my system back online. > > > > With the patch, the IOPS is about the same drop with the 1st patch. I think > the excessive context switches are causing the drop in IOPS. > > > > The following are captured by "perf sched record" for 30 seconds during > tests. > > > > "perf sched latency" > > With patch: > > fio:(82) | 937632.706 ms | 1782255 | avg: 0.209 ms | max: 63.123 > ms | max at: 768.274023 s > > > > without patch: > > fio:(82) |2348323.432 ms | 18848 | avg: 0.295 ms | max: 28.446 > ms | max at: 6447.310255 s > > Without patch means the proposed hard-irq patch? It means the current upstream code without any patch. But It's prone to soft lockup. Ming's proposed hard-irq patch gets similar results to "without patch", however it fixes the soft lockup. > > If we are context switching too much, it means the soft-irq operation is not > efficient, not necessarily the fact that the completion path is running in soft- > irq.. > > Is your kernel compiled with full preemption or voluntary preemption? The tests are based on Ubuntu 18.04 kernel configuration. Here are the parameters: # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set > > > Look closer at each CPU, we can see ksoftirqd is competing CPU with > > fio (and effectively throttle other fio processes) (captured in > > /sys/kernel/debug/tracing, echo sched:* >set_event) > > > > On CPU1 with patch: (note that the prev_state for fio is "R", it's > preemptively scheduled) > > <...>-4077 [001] d... 66456.805062: sched_switch: prev_comm=fio > prev_pid=4077 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/1 > next_pid=17 next_prio=120 > > <...>-17 [001] d... 66456.805859: sched_switch: > prev_comm=ksoftirqd/1 prev_pid=17 prev_prio=120 prev_state=S ==> > next_comm=fio next_pid=4077 next_prio=120 > > <...>-4077 [001] d... 66456.844049: sched_switch: prev_comm=fio > prev_pid=4077 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/1 > next_pid=17 next_prio=120 > > <...>-17 [001] d... 66456.844607: sched_switch: > prev_comm=ksoftirqd/1 prev_pid=17 prev_prio=120 prev_state=S ==> > next_comm=fio next_pid=4077 next_prio=120 > > > > On CPU1 without patch: (the prev_state for fio is "S", it's voluntarily > scheduled) > > <idle>-0 [001] d... 6725.392308: sched_switch: > prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> > next_comm=fio next_pid=14342 next_prio=120 > > fio-14342 [001] d... 6725.392332: sched_switch: prev_comm=fio > prev_pid=14342 prev_prio=120 prev_state=S ==> next_comm=swapper/1 > next_pid=0 next_prio=120 > > <idle>-0 [001] d... 6725.392356: sched_switch: > prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> > next_comm=fio next_pid=14342 next_prio=120 > > fio-14342 [001] d... 6725.392425: sched_switch: > > prev_comm=fio prev_pid=14342 prev_prio=120 prev_state=S ==> > > next_comm=swapper/1 next_pid=0 next_prio=12