On 2018/06/15 18:19, Dmitry Vyukov wrote: > On Thu, Jun 14, 2018 at 12:33 PM, Tetsuo Handa > <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: >> On 2018/06/11 16:39, Dmitry Vyukov wrote: >>> On Mon, Jun 11, 2018 at 9:30 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: >>>> On Sun, Jun 10, 2018 at 11:47:56PM +0900, Tetsuo Handa wrote: >>>> >>>>> This looks quite strange that nobody is holding percpu_rw_semaphore for >>>>> write but everybody is stuck trying to hold it for read. (Since there >>>>> is no "X locks held by ..." line without followup "#0:" line, there is >>>>> no possibility that somebody is in TASK_RUNNING state while holding >>>>> percpu_rw_semaphore for write.) >>>>> >>>>> I feel that either API has a bug or API usage is wrong. >>>>> Any idea for debugging this? >>>> >>>> Look at percpu_rwsem_release() and usage. The whole fs freezer thing is >>>> magic. >>> >>> Do you mean that we froze fs? We tried to never-ever issue >>> ioctl(FIFREEZE) during fuzzing. Are there other ways to do this? >>> >> >> Dmitry, can you try this patch? If you can get > > I've tried replying 5 logs with this patch, but I don't see that we > return to user-space with locks held, nor deadlock reports. Did you succeed to reproduce khungtaskd messages with this patch? If yes, was one of sb_writers#X/sb_pagefaults/sb_internal printed there? If no, we would want a git tree for testing under syzbot. > > What I've noticed is that all these logs contain lots of error > messages around block subsystem. Perhaps if we can identify the common > denominator across error messages in different logs, we can find the > one responsible for hangs. While there are lots of error messages around block subsystem, how can down_read() fail to continue unless up_write() somehow failed to wake up waiters sleeping at down_read(), assuming that khungtaskd says that none of sb_writers#X/sb_pagefaults/sb_internal was held? Hmm, there might be other locations calling percpu_rwsem_release() ?