On Tue 01-04-08 17:23:34, Erez Zadok wrote: <snip> > > From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001 > > From: Jan Kara <jack@xxxxxxx> > > Date: Tue, 18 Mar 2008 14:38:06 +0100 > > Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under > > inode_lock. > > > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > --- > > fs/drop_caches.c | 8 +++++++- > > 1 files changed, 7 insertions(+), 1 deletions(-) > > > > diff --git a/fs/drop_caches.c b/fs/drop_caches.c > > index 59375ef..f5aae26 100644 > > --- a/fs/drop_caches.c > > +++ b/fs/drop_caches.c > > @@ -14,15 +14,21 @@ int sysctl_drop_caches; > > > > static void drop_pagecache_sb(struct super_block *sb) > > { > > - struct inode *inode; > > + struct inode *inode, *toput_inode = NULL; > > > > spin_lock(&inode_lock); > > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { > > if (inode->i_state & (I_FREEING|I_WILL_FREE)) > > continue; > > + __iget(inode); > > + spin_unlock(&inode_lock); > > __invalidate_mapping_pages(inode->i_mapping, 0, -1, true); > > + iput(toput_inode); > > + toput_inode = inode; > > + spin_lock(&inode_lock); > > } > > spin_unlock(&inode_lock); > > + iput(toput_inode); > > } > > Jan, I'll be happy to test this, but I don't understand two things about > this patch: > > 1. Is it safe to unlock and re-lock inode_lock temporarily within the loop? > > 2. What's the motivation behind having the second toput_inode pointer? It > appears that the first iteration through the loop, toput_inode will be > NULL, so we'll be iput'ing a NULL pointer (which is ok). So you're > trying to iput the previous inode pointer that the list iterated over, > right? Is that intended? I'll try to explain the locking here: 1) We are not allowed to call into __invalidate_mapping_pages() with inode_lock held - that it the bug lockdep is complaining about. Moreover it leads to rather long waiting times for inode_lock (quite possibly several seconds). 2) When we release inode_lock, we need to protect from inode going away, thus we hold reference to it - that guarantees us inode stays in the list. 3) When we continue scanning of the list we must get inode_lock before we put the inode reference to avoid races. But we cannot do iput() when we hold inode_lock. Thus we save pointer to inode and do iput() next time we have released the inode_lock... > Peter's post: > > http://lkml.org/lkml/2008/3/23/202 > > includes a reference to a mail by Andrew which implies that the fix may be > much more involved than what you outlined above, no? Definitely we can do a more involved fix ;) What Andrew proposes would have some other benefits as well. But until somebody gets to that, this slight hack should work fine (Andrew has already merged my patch in -mm so I guess he agrees ;). Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html