On Tue 24-03-09 15:44:57, Wu Fengguang wrote: > Hi Masayoshi, > > On Tue, Mar 24, 2009 at 03:06:45PM +0800, Masayoshi MIZUMA wrote: > > Hi, Fengguang > > > > On Mon, 23 Mar 2009 18:38:46 +0800 > > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > > > > > Masasyoshi, > > > > > > On Wed, Mar 18, 2009 at 05:13:35PM +0900, Masasyoshi MIZUMA wrote: > > > > I create the patch which fixes lack of mutex_lock in drop_pagecache_sb(). > > > > Please check the bug and the patch (below). > > > > > Thank you for your comment, and I apologize to you for my lack > > of explanation. > > > > > Is this a real producible bug or a theory one? > > This is a real bug. > > > > > IMHO the I_FREEING flag should avoid the race. > > I supplement the explanation for this problem. > > > > clear_inode() is called by dispose_list(), and sets the inode's > > i_state to I_CLEAR. Therefore, the following conditional expression > > doesn't match for the inode: > > "if (inode->i_state & (I_FREEING|I_WILL_FREE)) continue;" > > As the result, this problem can happen. > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > When drop_pagecache_sb() frees inodes, it doesn't get mutex_lock of > > > > iprune_mutex. Therefore, if it races the process which frees inodes > > > > (ex. prune_icache()), OS panic may happen. > > > > > > > > An example of the panic flow is the following: > > > > ---------------------------------------------------------------------- > > > > [process A] | [process B] > > > > | | > > > > | shrink_icache_memory() | > > > > | | | > > > > | V | > > > > | prune_icache() | drop_pagecache() > > > > | mutex_lock(&iprune_mutex) | | > > > > | spin_lock(&inode_lock) | | > > > > | | | V > > > > | | | drop_pagecache_sb() > > > > | | | | > > > > > > inode->i_state |= I_FREEING; > > > > > > > | V | V > > > > | spin_unlock(&inode_lock) | spin_lock(&inode_lock) > > > > | | | | > > > > > > if (inode->i_state & (I_FREEING|I_WILL_FREE)) > > > continue; > > > > > > > | | | | > > > > | V | V > > > > | dispose_list() | __iget() > > > > | list_del() | | > > > > | | | | > > > > | V | V > > > > | spin_lock(&inode_lock) | list_move() <----- PANIC !! > > > > | | > > > > V | > > > > (time) > > > > ---------------------------------------------------------------------- > > > > If the inode which Process B do list_move() with is the same as the one which > > > > Process A did list_del() with, OS may panic. > > > > I applied your comment and then modified the panic flow figure. > > Please check below: > > ---------------------------------------------------------------------- > > [process A] | [process B] > > | | > > | shrink_icache_memory() | > > | | | > > | V | > > | prune_icache() | drop_pagecache() > > | mutex_lock(&iprune_mutex) | | > > | spin_lock(&inode_lock) | | > > | | | V > > | | | drop_pagecache_sb() > > | | | | > > | V | | > > | inode->i_state |= I_FREEING; | | > > | | | | > > | V | V > > | spin_unlock(&inode_lock) | spin_lock(&inode_lock) > > | | | | > > | | | | > > | V | | > > | dispose_list() | | > > | list_del() | | > > | | | | > > | V | | > > | clear_inode() | | > > | inode->i_state = I_CLEAR | | > > | | | | > > | | | V > > | | | if (inode->i_state & (I_FREEING|I_WILL_FREE)) > > | | | continue; <---- NOT MATCH > > | | | | > > | | | V > > | | | __iget() > > | | | | > > | V | V > > | spin_lock(&inode_lock) | list_move() <----- PANIC !! > > | | > > V | > > (time) > > ---------------------------------------------------------------------- > > Ah thanks for the explanation! > > How about this lightweight fix? Since s_umount is already taken in > drop_pagecache(), it's not necessary to take iprune_mutex again. Yes, the patch looks fine like this. But I believe there are more places which might miss similar check. Probably dquot.c: add_dquot_ref() needs the same check and fs-writeback.c: generic_sync_sb_inodes() also needs adding I_CLEAR to the test, doesn't it? Honza > --- mm.orig/fs/drop_caches.c > +++ mm/fs/drop_caches.c > @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup > > spin_lock(&inode_lock); > list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { > - if (inode->i_state & (I_FREEING|I_WILL_FREE)) > + if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE)) > continue; > if (inode->i_mapping->nrpages == 0) > continue; -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html