Hi Steven, hi Ingo, I recently tried to analyze the performance of Samba using io-uring. I was using ubuntu 20.04 with the 5.10.0-1023-oem kernel, which is based on v5.10.25, see: https://kernel.ubuntu.com/git/kernel-ppa/mirror/ubuntu-oem-5.10-focal.git/log/?h=oem-5.10-prep trace-cmd is at version 2.8.3-4build1. In order to find the bottleneck I tried to use (trace-cmd is at version 2.8.3-4build1): trace-cmd -e all -P ${pid_of_io_uring_worker} As a result the server was completely dead immediately. I tried to reproduce this in a virtual machine (inside virtualbox). I used a modified 'io_uring-cp' that loops forever, see: https://github.com/metze-samba/liburing/commit/5e98efed053baf03521692e786c1c55690b04d8e When I run './io_uring-cp-forever link-cp.c file', I see a 'io_wq_manager' and a 'io_wqe_worker-0' kernel thread, while './io_uring-cp-forever link-cp.c file' as well as 'io_wqe_worker-0' consume about 25% cpu each. When I run 'trace-cmd -e all -P $pid' for 'io_wqe_worker-0' or 'io_wq_manager' I can reproduce the problem, then I found that the same seems to happen for also for other kernel threads e.g. '[kworker/1:1-events]', it seems that it happens for all kernel threads, which are not completely idle. Which this: From 'top': 1341 root 20 0 2512 576 508 S 33,4 0,1 0:10.39 io_uring-cp-for 1343 root 20 0 0 0 0 R 29,8 0,0 0:08.43 io_wqe_worker-0 7 root 20 0 0 0 0 I 0,3 0,0 0:00.31 kworker/0:1-events PID 5 is [kworker/0:0-ata_sff] # trace-cmd record -e all -P 5' Hit Ctrl^C to stop recording ^CCPU0 data recorded at offset=0x7b8000 0 bytes in size CPU1 data recorded at offset=0x7b8000 69632 bytes in size # But # trace-cmd record -e all -P 7 => machine unresponsive (no blinking cursor on the console anymore) On the host 'top' shows that the VirtualBoxVM cpu emulator thread 'EMT-1' uses 100% cpu, so I guess the guest kernel is in something like an endless recursion loop. Maybe a trace function recursing to itself? On the same VM I tried a 5.12rc8 kernel and there I can also reproduce the problem. I also managed to reproduce the problem without io-uring, just using: while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done in order to keep some kernel threads moving. This happens with 5.12rc8 and 5.10.0-1023-oem, but I was not able to reproduce any of this using the 5.8.0-50-generic kernel, see https://kernel.ubuntu.com/git/ubuntu/ubuntu-focal.git/log/?h=Ubuntu-hwe-5.8-5.8.0-50.56_20.04.1 I was also able to reproduce this with a ubuntu 21.04 vm using the 5.11.0-14-generic kernel, see: https://kernel.ubuntu.com/git/ubuntu/ubuntu-hirsute.git/log/?h=Ubuntu-5.11.0-14.15 On this one I only managed to reproduce the problem with './io_uring-cp-forever link-cp.c file', but not with 'while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done' So it seems the problem was introduced after 5.8 and is not really related to io-uring. And it may not be purely related to kernel threads. With this on 5.12-rc8 (again): └─tmux: server,903 ├─bash,904 │ └─io_uring-cp-for,925 link-cp.c file │ ├─{iou-mgr-925},926 │ └─{iou-wrk-925},927 └─bash,929 └─pstree,938 -a -t -p I was able to to trace once: root@ub1704-166:~# trace-cmd record -e all -P 925 Hit Ctrl^C to stop recording ^CCPU0 data recorded at offset=0x7b8000 10842112 bytes in size CPU1 data recorded at offset=0x120f000 36450304 bytes in size But the 2nd run reproduced the problem: root@ub1704-166:~# trace-cmd record -e all -P 925 I was also able to reproduce it with: while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done and pidofcat=$(pidof cat); echo $pidofcat; trace-cmd record -e all -P $pidofcat So it seems any busy thread (user or kernel) triggers the problem. Any ideas what has changed after 5.8? Thanks! metze