Jan and all, It finally took a month to happen again : python processes watching a directory are stuck in inotify_release. I ran the sysrq commands as you requested and attached the result. Thanks for your help On Thu, Mar 28, 2019 at 10:52 AM Jan Kara <jack@xxxxxxx> wrote: > > Hello, > > On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote: > > According to what I read on internet you seem to be the right person to get > > in touch with when one has problems with inotify. > > Yes, there's also linux-fsdevel@xxxxxxxxxxxxxxx mailing list which we use > (added to CC). > > > We are monitoring several directories in python processes through inotify. > > But after few days all processes are stuck in a call to inotify_release. > > Once I detected the problem, I dumped info to dmesg with sysrq-trigger > > (dmesg content attached): > > echo w > /proc/sysrq-trigger > > Looking through the stack traces, all of them wait in fput() -> > inotify_release() -> ... -> fsnotify_wait_marks_destroyed() -> > flush_delayed_work(&reaper_work). So they wait for worker process to > destroy all marks for the group. However that worker (kworker/u8:4) is > stuck in: > > fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu) > > So the question is who is holding fsnotify_mark_srcu so that SRCU cannot > declare new grace period. I don't see any such process among the processes > you've shown in the dump (but it should be there) so it's a bit of a > mystery. > > > Our production env is ubuntu 18.04 kernel 4.15 fs ext4 > > This problem appears on a weekly basis so I will be able to run additional > > commands to track down the issue if needed. > > So when this happens again, try grabbing output of sysrq-l and sysrq-t if > we can find the task holding fsnotify_mark_srcu. > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR -- Olivier Chapelliere
Attachment:
kern.log.tar.gz
Description: application/gzip