On Thu, May 14, 2009 at 05:23:25PM -0400, Theodore Tso wrote: > On Fri, May 15, 2009 at 06:32:45AM +0930, Kevin Shanahan wrote: > > Okay, so now I've booted into 2.6.29.3 + check_block_validity patch + > > short circuit i_cached_extent patch, mounted the fs without > > nodelalloc. I was able to run the full exchange backup without > > triggering the check_block_validity error. > > Great! > > So here's the final fix (it replaces the short circuit i_cached_extent > patch) which I plan to push to Linus. It should be much less of a > performance hit than simply short-circuiting i_cached_extent... > > Thanks so much for helping to find track this down!!! If ever someone > deserved an "Ext4 Baker Street Irregulars" T-shirt, it would be > you.... > > - Ted > > commit 039ed7a483fdcb2dbbc29f00cd0d74c101ab14c5 > Author: Theodore Ts'o <tytso@xxxxxxx> > Date: Thu May 14 17:09:37 2009 -0400 > > ext4: Fix race in ext4_inode_info.i_cached_extent > > If one CPU is reading from a file while another CPU is writing to the > same file different locations, there is nothing protecting the > i_cached_extent structure from being used and updated at the same > time. This could potentially cause the wrong location on disk to be > read or written to, including potentially causing the corruption of > the block group descriptors and/or inode table. It should be multiple readers. We don't allow read/write or multiple writers via ext4_ext_get_blocks. &EXT4_I(inode)->i_data_sem is supposed to protect read/write and multiple writers. What it allowed was multiple readers(get_block call with create = 0). And readers did cache the extent information which it read from the disk. So the fix is correct, but we need to update the commit message. Reviewed-by:Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx> > > Many thanks to Ken Shannah for helping to track down this problem. > > Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx> > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 172656c..e3a55eb 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -1841,11 +1841,13 @@ ext4_ext_put_in_cache(struct inode *inode, ext4_lblk_t block, > { > struct ext4_ext_cache *cex; > BUG_ON(len == 0); > + spin_lock(&EXT4_I(inode)->i_block_reservation_lock); > cex = &EXT4_I(inode)->i_cached_extent; > cex->ec_type = type; > cex->ec_block = block; > cex->ec_len = len; > cex->ec_start = start; > + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > } > > /* > @@ -1902,12 +1904,17 @@ ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block, > struct ext4_extent *ex) > { > struct ext4_ext_cache *cex; > + int ret = EXT4_EXT_CACHE_NO; > > + /* > + * We borrow i_block_reservation_lock to protect i_cached_extent > + */ > + spin_lock(&EXT4_I(inode)->i_block_reservation_lock); > cex = &EXT4_I(inode)->i_cached_extent; > > /* has cache valid data? */ > if (cex->ec_type == EXT4_EXT_CACHE_NO) > - return EXT4_EXT_CACHE_NO; > + goto errout; > > BUG_ON(cex->ec_type != EXT4_EXT_CACHE_GAP && > cex->ec_type != EXT4_EXT_CACHE_EXTENT); > @@ -1918,11 +1925,11 @@ ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block, > ext_debug("%u cached by %u:%u:%llu\n", > block, > cex->ec_block, cex->ec_len, cex->ec_start); > - return cex->ec_type; > + ret = cex->ec_type; > } > - > - /* not in cache */ > - return EXT4_EXT_CACHE_NO; > +errout: > + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); > + return ret; > } > > /* > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html