Hi all, If it can help troubleshoot the issue, I found another host with the same symptoms. I ran sysrq commands and attached the kernel log file. Thanks Olivier On Mon, May 6, 2019 at 8:54 PM Olivier Chapelliere <olivier.chapelliere@xxxxxxxxxxx> wrote: > > Jan and all, > > It finally took a month to happen again : python processes watching a > directory are stuck in inotify_release. > I ran the sysrq commands as you requested and attached the result. > > Thanks for your help > > On Thu, Mar 28, 2019 at 10:52 AM Jan Kara <jack@xxxxxxx> wrote: > > > > Hello, > > > > On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote: > > > According to what I read on internet you seem to be the right person to get > > > in touch with when one has problems with inotify. > > > > Yes, there's also linux-fsdevel@xxxxxxxxxxxxxxx mailing list which we use > > (added to CC). > > > > > We are monitoring several directories in python processes through inotify. > > > But after few days all processes are stuck in a call to inotify_release. > > > Once I detected the problem, I dumped info to dmesg with sysrq-trigger > > > (dmesg content attached): > > > echo w > /proc/sysrq-trigger > > > > Looking through the stack traces, all of them wait in fput() -> > > inotify_release() -> ... -> fsnotify_wait_marks_destroyed() -> > > flush_delayed_work(&reaper_work). So they wait for worker process to > > destroy all marks for the group. However that worker (kworker/u8:4) is > > stuck in: > > > > fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu) > > > > So the question is who is holding fsnotify_mark_srcu so that SRCU cannot > > declare new grace period. I don't see any such process among the processes > > you've shown in the dump (but it should be there) so it's a bit of a > > mystery. > > > > > Our production env is ubuntu 18.04 kernel 4.15 fs ext4 > > > This problem appears on a weekly basis so I will be able to run additional > > > commands to track down the issue if needed. > > > > So when this happens again, try grabbing output of sysrq-l and sysrq-t if > > we can find the task holding fsnotify_mark_srcu. > > > > Honza > > -- > > Jan Kara <jack@xxxxxxxx> > > SUSE Labs, CR > > > > -- > Olivier Chapelliere -- Olivier Chapelliere
Attachment:
kern.log.tar.gz
Description: application/gzip