2022-06-01 6:05 GMT+09:00, David Howells <dhowells@xxxxxxxxxx>: > Hi Namjae, > > Steve says I should show this to you. > > My server box that I'm using to do cifs-over-RDMA testing is running really > slowly because it has about 30 ksmbd thread hogging the cpus: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 19993 root 20 0 0 0 0 R 14.3 0.0 910:06.02 > ksmbd:r5445 > 20048 root 20 0 0 0 0 R 14.3 0.0 896:19.22 > ksmbd:r5445 > 20052 root 20 0 0 0 0 R 14.3 0.0 901:51.52 > ksmbd:r5445 > 20053 root 20 0 0 0 0 R 14.3 0.0 904:20.84 > ksmbd:r5445 > 20056 root 20 0 0 0 0 R 14.3 0.0 910:39.38 > ksmbd:r5445 > 20095 root 20 0 0 0 0 R 14.3 0.0 901:28.48 > ksmbd:r5445 > 20097 root 20 0 0 0 0 R 14.3 0.0 910:02.19 > ksmbd:r5445 > 20103 root 20 0 0 0 0 R 14.3 0.0 912:13.18 > ksmbd:r5445 > 20105 root 20 0 0 0 0 R 14.3 0.0 908:46.76 > ksmbd:r5445 > ... > > > I tried to shut them down with "ksmbd.control -s", but that just hung and > the > threads are still running. I captured a stack trace from one of them > through > /proc: > > [root@carina ~]# cat /proc/20052/stack > [<0>] ksmbd_conn_handler_loop+0x181/0x200 [ksmbd] > [<0>] kthread+0xe8/0x110 > [<0>] ret_from_fork+0x22/0x30 > > Note that nothing is currently mounted from the server and it is getting no > incoming packets. Okay, How do you reproduce this problem ? Did you run xfsftests against ksmbd RDMA ? > > Looking at the loop in ksmbd_conn_handler_loop(), it seems to be > busy-waiting > - unless kernel_recvmsg() is doing that? In the TCP transport, if > kernel_recvmsg() isn't waiting, but returns -EAGAIN, it will sleep for > 1-2ms > and then go round again... and again... and again - and all 30 threads > would > be doing that. Okay, we need to add maximum retry count for that case. but when I check kernel thread name in your top message, It is RDMA connection. So smb_direct_read() is used in ksmbd_conn_handler_loop(). I'd like to reproduce the problem to figure out where the problem is. Can I try to reproduce it with soft-iWARP and xfstests? > > > Btw in: > > ret = kernel_accept(iface->ksmbd_socket, &client_sk, > O_NONBLOCK); > > that should be SOCK_NONBLOCK, I think. Ah, I found that normally it is O_NONBLOCK but there are different value for some arch. I will change it. Thanks for pointing out:) /include/linux/net.h #ifndef SOCK_NONBLOCK #define SOCK_NONBLOCK O_NONBLOCK #endif /arch/alpha/include/asm/socket.h #define SOCK_NONBLOCK 0x40000000 > > Also: > > [root@carina ~]# ksmbd.control --shutdown > Usage: ksmbd.control > -s | --shutdown > ... > > that looks like it doesn't handle the advertised long parameters. I will fix it:) Thanks! > >