Hi Masayoshi, On Tue, Mar 24, 2009 at 03:06:45PM +0800, Masayoshi MIZUMA wrote: > Hi, Fengguang > > On Mon, 23 Mar 2009 18:38:46 +0800 > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > > > Masasyoshi, > > > > On Wed, Mar 18, 2009 at 05:13:35PM +0900, Masasyoshi MIZUMA wrote: > > > I create the patch which fixes lack of mutex_lock in drop_pagecache_sb(). > > > Please check the bug and the patch (below). > > > Thank you for your comment, and I apologize to you for my lack > of explanation. > > > Is this a real producible bug or a theory one? > This is a real bug. > > > IMHO the I_FREEING flag should avoid the race. > I supplement the explanation for this problem. > > clear_inode() is called by dispose_list(), and sets the inode's > i_state to I_CLEAR. Therefore, the following conditional expression > doesn't match for the inode: > "if (inode->i_state & (I_FREEING|I_WILL_FREE)) continue;" > As the result, this problem can happen. > > > > > > ---------------------------------------------------------------------- > > > > > > When drop_pagecache_sb() frees inodes, it doesn't get mutex_lock of > > > iprune_mutex. Therefore, if it races the process which frees inodes > > > (ex. prune_icache()), OS panic may happen. > > > > > > An example of the panic flow is the following: > > > ---------------------------------------------------------------------- > > > [process A] | [process B] > > > | | > > > | shrink_icache_memory() | > > > | | | > > > | V | > > > | prune_icache() | drop_pagecache() > > > | mutex_lock(&iprune_mutex) | | > > > | spin_lock(&inode_lock) | | > > > | | | V > > > | | | drop_pagecache_sb() > > > | | | | > > > > inode->i_state |= I_FREEING; > > > > > | V | V > > > | spin_unlock(&inode_lock) | spin_lock(&inode_lock) > > > | | | | > > > > if (inode->i_state & (I_FREEING|I_WILL_FREE)) > > continue; > > > > > | | | | > > > | V | V > > > | dispose_list() | __iget() > > > | list_del() | | > > > | | | | > > > | V | V > > > | spin_lock(&inode_lock) | list_move() <----- PANIC !! > > > | | > > > V | > > > (time) > > > ---------------------------------------------------------------------- > > > If the inode which Process B do list_move() with is the same as the one which > > > Process A did list_del() with, OS may panic. > > I applied your comment and then modified the panic flow figure. > Please check below: > ---------------------------------------------------------------------- > [process A] | [process B] > | | > | shrink_icache_memory() | > | | | > | V | > | prune_icache() | drop_pagecache() > | mutex_lock(&iprune_mutex) | | > | spin_lock(&inode_lock) | | > | | | V > | | | drop_pagecache_sb() > | | | | > | V | | > | inode->i_state |= I_FREEING; | | > | | | | > | V | V > | spin_unlock(&inode_lock) | spin_lock(&inode_lock) > | | | | > | | | | > | V | | > | dispose_list() | | > | list_del() | | > | | | | > | V | | > | clear_inode() | | > | inode->i_state = I_CLEAR | | > | | | | > | | | V > | | | if (inode->i_state & (I_FREEING|I_WILL_FREE)) > | | | continue; <---- NOT MATCH > | | | | > | | | V > | | | __iget() > | | | | > | V | V > | spin_lock(&inode_lock) | list_move() <----- PANIC !! > | | > V | > (time) > ---------------------------------------------------------------------- Ah thanks for the explanation! How about this lightweight fix? Since s_umount is already taken in drop_pagecache(), it's not necessary to take iprune_mutex again. Thanks, Fengguang --- fs/drop_caches.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- mm.orig/fs/drop_caches.c +++ mm/fs/drop_caches.c @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup spin_lock(&inode_lock); list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { - if (inode->i_state & (I_FREEING|I_WILL_FREE)) + if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE)) continue; if (inode->i_mapping->nrpages == 0) continue; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html