On 07/11/2020 16:28, Pavel Begunkov wrote: > On 07/11/2020 14:09, Josef wrote: >>> I haven't got the first email, is it "kernel NULL pointer dereference" >>> as in the subject or just freeze? >> >> that's weird..probably the size of the attached log file is too big... >> here dmesg log file >> https://gist.github.com/1Jo1/3d0bcefc18f097265f0dc1ef054a87c0 > > That's much better with the log, thanks! I'll take a look later Ok, we get into fget_many() without ->files, and it's clear how it may happen. I'll write up a patch. > >> >>> - did you locate which test hangs it? If so what it uses? e.g. SQPOLL >>> sharing, IOPOLL., etc. >> >> yes, it uses SQPOLL, without sharing, IPOLL is not enabled, and Async >> Flag is enabled >> >>> - is it send/recvmsg, send/recv you use? any other? >> >> no the tests which occurs the error use these operations: OP_READ, >> OP_WRITE, OP_POLL_ADD, OP_POLL_REMOVE, OP_CLOSE, OP_ACCEPT, OP_TIMEOUT >> (OP_READ, OP_WRITE and OP_CLOSE async flag is enabled) >> >>> - does this happen often? >> >> yeah quite often >> >>> - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread` >>> (funcgraph is from bpftools). Or catch that with some other tools. >> >> I'm not quite familiar with these tools( kernel debugging in general) >> I'll take a look tomorrow > -- Pavel Begunkov