Re: stuck in inotify_release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

On Tue 14-05-19 16:35:29, mathieu lacage wrote:
> We are going to setup a new ubuntu 16.04 server, rebuild a vanilla 5.0
> kernel on that and run a fraction of our production workload on that. Is
> this ok for you ? If so, I will let you know as soon as we observe the
> problem on this server again.

Yes, that should rule out any Ubuntu specific problems thanks!

								Honza

> 
> Mathieu
> 
> Le mar. 14 mai 2019 à 15:22, Olivier Chapelliere <
> olivier.chapelliere@xxxxxxxxxxx> a écrit :
> 
> > ---------- Forwarded message ---------
> > From: Jan Kara <jack@xxxxxxx>
> > Date: Tue, May 14, 2019 at 11:25 AM
> > Subject: Re: stuck in inotify_release
> > To: Olivier Chapelliere <olivier.chapelliere@xxxxxxxxxxx>
> > Cc: Jan Kara <jack@xxxxxxx>, <linux-fsdevel@xxxxxxxxxxxxxxx>
> >
> >
> > Hello!
> >
> > On Mon 06-05-19 20:54:24, Olivier Chapelliere wrote:
> > > It finally took a month to happen again : python processes watching a
> > > directory are stuck in inotify_release.
> > > I ran the sysrq commands as you requested and attached the result.
> >
> > Thanks. I was looking into these traces but the situation is the same as
> > before. Everyone is blocked waiting for inotify group to shut down. That is
> > blocked waiting for worker to finish destroying notification marks and the
> > worker is blocked in synchronize_srcu() waiting for SRCU grace period to
> > end. Now I didn't find any process that would be holding the SRCU lock so
> > it seems that someone exited the SRCU locked section without releasing the
> > lock. I've checked 4.15 your Ubuntu kernel is based on and I don't see how
> > that would be possible. It it possible though, that the problem is
> > introduced by some Ubuntu specific backports. Would it be possible for you
> > to run some vanilla kernel (i.e., without Ubuntu modifications)?
> >
> >                                                                 Honza
> >
> > > On Thu, Mar 28, 2019 at 10:52 AM Jan Kara <jack@xxxxxxx> wrote:
> > > >
> > > > Hello,
> > > >
> > > > On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote:
> > > > > According to what I read on internet you seem to be the right person
> > to get
> > > > > in touch with when one has problems with inotify.
> > > >
> > > > Yes, there's also linux-fsdevel@xxxxxxxxxxxxxxx mailing list which we
> > use
> > > > (added to CC).
> > > >
> > > > > We are monitoring several directories in python processes through
> > inotify.
> > > > > But after few days all processes are stuck in a call to
> > inotify_release.
> > > > > Once I detected the problem, I dumped info to dmesg with
> > sysrq-trigger
> > > > > (dmesg content attached):
> > > > > echo w > /proc/sysrq-trigger
> > > >
> > > > Looking through the stack traces, all of them wait in fput() ->
> > > > inotify_release() -> ... -> fsnotify_wait_marks_destroyed() ->
> > > > flush_delayed_work(&reaper_work). So they wait for worker process to
> > > > destroy all marks for the group. However that worker (kworker/u8:4) is
> > > > stuck in:
> > > >
> > > > fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu)
> > > >
> > > > So the question is who is holding fsnotify_mark_srcu so that SRCU
> > cannot
> > > > declare new grace period. I don't see any such process among the
> > processes
> > > > you've shown in the dump (but it should be there) so it's a bit of a
> > > > mystery.
> > > >
> > > > > Our production env is ubuntu 18.04 kernel 4.15 fs ext4
> > > > > This problem appears on a weekly basis so I will be able to run
> > additional
> > > > > commands to track down the issue if needed.
> > > >
> > > > So when this happens again, try grabbing output of sysrq-l and sysrq-t
> > if
> > > > we can find the task holding fsnotify_mark_srcu.
> > > >
> > > >                                                                 Honza
> > > > --
> > > > Jan Kara <jack@xxxxxxxx>
> > > > SUSE Labs, CR
> > >
> > >
> > >
> > > --
> > > Olivier Chapelliere
> >
> >
> > --
> > Jan Kara <jack@xxxxxxxx>
> > SUSE Labs, CR
> >
> >
> > --
> > Olivier Chapelliere
> >
> 
> 
> -- 
> Mathieu Lacage <mathieu.lacage@xxxxxxxxxxx>
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux