On Wed 16-10-19 10:26:16, Eric Sandeen wrote: > On 10/16/19 9:39 AM, Eric Sandeen wrote: > > On 10/16/19 8:49 AM, Jan Kara wrote: > >> On Wed 16-10-19 08:23:51, Eric Sandeen wrote: > >>> On 10/16/19 4:42 AM, Jan Kara wrote: > >>>> On Tue 15-10-19 21:36:08, Eric Sandeen wrote: > >>>>> On 10/15/19 2:37 AM, Jan Kara wrote: > >>>>>> On Mon 14-10-19 16:30:24, Eric Sandeen wrote: > >>>>>>> Anything that walks all inodes on sb->s_inodes list without rescheduling > >>>>>>> risks softlockups. > >>>>>>> > >>>>>>> Previous efforts were made in 2 functions, see: > >>>>>>> > >>>>>>> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() > >>>>>>> ac05fbb inode: don't softlockup when evicting inodes > >>>>>>> > >>>>>>> but there hasn't been an audit of all walkers, so do that now. This > >>>>>>> also consistently moves the cond_resched() calls to the bottom of each > >>>>>>> loop in cases where it already exists. > >>>>>>> > >>>>>>> One loop remains: remove_dquot_ref(), because I'm not quite sure how > >>>>>>> to deal with that one w/o taking the i_lock. > >>>>>>> > >>>>>>> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx> > >>>>>> > >>>>>> Thanks Eric. The patch looks good to me. You can add: > >>>>>> > >>>>>> Reviewed-by: Jan Kara <jack@xxxxxxx> > >>>>> > >>>>> thanks > >>>>> > >>>>>> BTW, I suppose you need to add Al to pickup the patch? > >>>>> > >>>>> Yeah (cc'd now) > >>>>> > >>>>> But it was just pointed out to me that if/when the majority of inodes > >>>>> at umount time have i_count == 0, we'll never hit the resched in > >>>>> fsnotify_unmount_inodes() and may still have an issue ... > >>>> > >>>> Yeah, that's a good point. So that loop will need some further tweaking > >>>> (like doing iget-iput dance in need_resched() case like in some other > >>>> places). > >>> > >>> Well, it's already got an iget/iput for anything with i_count > 0. But > >>> as the comment says (and I think it's right...) doing an iget/iput > >>> on i_count == 0 inodes at this point would be without SB_ACTIVE and the final > >>> iput here would actually start evicting inodes in /this/ loop, right? > >> > >> Yes, it would but since this is just before calling evict_inodes(), I have > >> currently hard time remembering why evicting inodes like that would be an > >> issue. > > > > Probably just weird to effectively evict all inodes prior to evict_inodes() ;) > > > >>> I think we could (ab)use the lru list to construct a "dispose" list for > >>> fsnotify processing as was done in evict_inodes... > > > > [narrator: Eric's idea here is dumb and it won't work] > > > >>> or maybe the two should be merged, and fsnotify watches could be handled > >>> directly in evict_inodes. But that doesn't feel quite right. > >> > >> Merging the two would be possible (and faster!) as well but I agree it > >> feels a bit dirty :) > > > > It's starting to look like maybe the only option... > > > > I'll see if Al is willing to merge this patch as is for the simple "schedule > > the big loops" and see about a 2nd patch on top to do more surgery for this > > case. > > Sorry for thinking out loud in public but I'm not too familiar with fsnotify, so > I'm being timid. However, since fsnotify_sb_delete() and evict_inodes() are working > on orthogonal sets of inodes (fsnotify_sb_delete only cares about nonzero refcount, > and evict_inodes only cares about zero refcount), I think we can just swap the order > of the calls. The fsnotify call will then have a much smaller list to walk > (any refcounted inodes) as well. > > I'll try to give this a test. Yes, this should make the softlockup impossible to trigger in practice. So agreed. Honza > > diff --git a/fs/super.c b/fs/super.c > index cfadab2cbf35..cd352530eca9 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -448,10 +448,12 @@ void generic_shutdown_super(struct super_block *sb) > sync_filesystem(sb); > sb->s_flags &= ~SB_ACTIVE; > > - fsnotify_sb_delete(sb); > cgroup_writeback_umount(); > > + /* evict all inodes with zero refcount */ > evict_inodes(sb); > + /* only nonzero refcount inodes can have marks */ > + fsnotify_sb_delete(sb); > > if (sb->s_dio_done_wq) { > destroy_workqueue(sb->s_dio_done_wq); > > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR