On Tue 29-01-19 16:56:36, Andrew Morton wrote: > On Mon, 14 Jan 2019 09:53:43 +0100 Jan Kara <jack@xxxxxxx> wrote: > > > When superblock has lots of inodes without any pagecache (like is the > > case for /proc), drop_pagecache_sb() will iterate through all of them > > without dropping sb->s_inode_list_lock which can lead to softlockups > > (one of our customers hit this). > > > > Fix the problem by going to the slow path and doing cond_resched() in > > case the process needs rescheduling. > > > > ... > > > > --- a/fs/drop_caches.c > > +++ b/fs/drop_caches.c > > @@ -21,8 +21,13 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused) > > spin_lock(&sb->s_inode_list_lock); > > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { > > spin_lock(&inode->i_lock); > > + /* > > + * We must skip inodes in unusual state. We may also skip > > + * inodes without pages but we deliberately won't in case > > + * we need to reschedule to avoid softlockups. > > + */ > > if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) || > > - (inode->i_mapping->nrpages == 0)) { > > + (inode->i_mapping->nrpages == 0 && !need_resched())) { > > spin_unlock(&inode->i_lock); > > continue; > > } > > @@ -30,6 +35,7 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused) > > spin_unlock(&inode->i_lock); > > spin_unlock(&sb->s_inode_list_lock); > > > > + cond_resched(); > > invalidate_mapping_pages(inode->i_mapping, 0, -1); > > iput(toput_inode); > > toput_inode = inode; > > Are we sure there's no situation in which a large number of inodes can > be in the "unusual state", leading to the same issue? No, we cannot be really sure that there aren't many such inodes (although I'd be surprised if there were). But the problem with "unusual state" inodes is that you cannot just __iget() them (well, you could but it's breaking the rules and would lead to use-after-free issues ;) and so there's no easy way to drop the spinlock and then continue the iteration after cond_resched(). So overall it's too complex to deal with this until someone actually hits it. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR