Re: [PATCH V2] fs: avoid softlockups in s_inodes iterators

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/16/19 9:39 AM, Eric Sandeen wrote:
> On 10/16/19 8:49 AM, Jan Kara wrote:
>> On Wed 16-10-19 08:23:51, Eric Sandeen wrote:
>>> On 10/16/19 4:42 AM, Jan Kara wrote:
>>>> On Tue 15-10-19 21:36:08, Eric Sandeen wrote:
>>>>> On 10/15/19 2:37 AM, Jan Kara wrote:
>>>>>> On Mon 14-10-19 16:30:24, Eric Sandeen wrote:
>>>>>>> Anything that walks all inodes on sb->s_inodes list without rescheduling
>>>>>>> risks softlockups.
>>>>>>>
>>>>>>> Previous efforts were made in 2 functions, see:
>>>>>>>
>>>>>>> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
>>>>>>> ac05fbb inode: don't softlockup when evicting inodes
>>>>>>>
>>>>>>> but there hasn't been an audit of all walkers, so do that now.  This
>>>>>>> also consistently moves the cond_resched() calls to the bottom of each
>>>>>>> loop in cases where it already exists.
>>>>>>>
>>>>>>> One loop remains: remove_dquot_ref(), because I'm not quite sure how
>>>>>>> to deal with that one w/o taking the i_lock.
>>>>>>>
>>>>>>> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
>>>>>>
>>>>>> Thanks Eric. The patch looks good to me. You can add:
>>>>>>
>>>>>> Reviewed-by: Jan Kara <jack@xxxxxxx>
>>>>>
>>>>> thanks
>>>>>
>>>>>> BTW, I suppose you need to add Al to pickup the patch?
>>>>>
>>>>> Yeah (cc'd now)
>>>>>
>>>>> But it was just pointed out to me that if/when the majority of inodes
>>>>> at umount time have i_count == 0, we'll never hit the resched in 
>>>>> fsnotify_unmount_inodes() and may still have an issue ...
>>>>
>>>> Yeah, that's a good point. So that loop will need some further tweaking
>>>> (like doing iget-iput dance in need_resched() case like in some other
>>>> places).
>>>
>>> Well, it's already got an iget/iput for anything with i_count > 0.  But
>>> as the comment says (and I think it's right...) doing an iget/iput
>>> on i_count == 0 inodes at this point would be without SB_ACTIVE and the final
>>> iput here would actually start evicting inodes in /this/ loop, right?
>>
>> Yes, it would but since this is just before calling evict_inodes(), I have
>> currently hard time remembering why evicting inodes like that would be an
>> issue.
> 
> Probably just weird to effectively evict all inodes prior to evict_inodes() ;)
> 
>>> I think we could (ab)use the lru list to construct a "dispose" list for
>>> fsnotify processing as was done in evict_inodes...
> 
> [narrator: Eric's idea here is dumb and it won't work]
> 
>>> or maybe the two should be merged, and fsnotify watches could be handled
>>> directly in evict_inodes.  But that doesn't feel quite right.
>>
>> Merging the two would be possible (and faster!) as well but I agree it
>> feels a bit dirty :)
> 
> It's starting to look like maybe the only option...
> 
> I'll see if Al is willing to merge this patch as is for the simple "schedule
> the big loops" and see about a 2nd patch on top to do more surgery for this
> case.

Sorry for thinking out loud in public but I'm not too familiar with fsnotify, so
I'm being timid.  However, since fsnotify_sb_delete() and evict_inodes() are working
on orthogonal sets of inodes (fsnotify_sb_delete only cares about nonzero refcount,
and evict_inodes only cares about zero refcount), I think we can just swap the order
of the calls.  The fsnotify call will then have a much smaller list to walk
(any refcounted inodes) as well.

I'll try to give this a test.

diff --git a/fs/super.c b/fs/super.c
index cfadab2cbf35..cd352530eca9 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -448,10 +448,12 @@ void generic_shutdown_super(struct super_block *sb)
 		sync_filesystem(sb);
 		sb->s_flags &= ~SB_ACTIVE;
 
-		fsnotify_sb_delete(sb);
 		cgroup_writeback_umount();
 
+		/* evict all inodes with zero refcount */
 		evict_inodes(sb);
+		/* only nonzero refcount inodes can have marks */
+		fsnotify_sb_delete(sb);
 
 		if (sb->s_dio_done_wq) {
 			destroy_workqueue(sb->s_dio_done_wq);





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux