On Fri, Nov 25, 2011 at 9:08 AM, Theodore Ts'o <tytso@xxxxxxx> wrote: > We use an separate flag in buffer head to determine whether the bitmap > has been valid. This is distinct from it being uptodate, due to the > uninit_bg feature. More details about the rationale for this flag can > be found in commit 2ccb5fb9f1. We set this bitmap_uptodate bit before > issuing the read request, so if another CPU attempts to load the same > block or inode bitmap, since ext4_read_{block,inode}_bitmap() checks > the bitmap_uptodate flag without locking the buffer head, hilarity > ensues. > > This result of this bug is that occasionally a block or inode gets > allocated twice, which gets noticed when the second user of the block > gets deleted, or when an directory suddenly becomes a regular file or > a symlink. I'm *really* surprised this doesn't happen more often; but > in actual practice the fact that we tend to search for a zero bit in > the bitmap without taking a lock, and then taking the block group lock > and double checking to see if we actually got the allocation tends to > protect us. It is true for inode bitmap, but block bitmap is another story, blocks are allocated from buddy allocator and mb_load_buddy does not call block_read_bitmap at all. block_read_bitmap is called in mark_space_used, free_blocks and free_inode_pa. So I am guessing if mb_load_buddy calls block_read_bitmap, the bug will reproduced easily. BTW: It seems that we should factor code reading bitmaps. Now there is one function for each every bitmap. and we can pass read_block/inode_bitmap a flag indicates sync or async read and a flag indicates which kind of bitmap. Thus mb_load_buddy can use read_bimap as well and the code will be much more maintainable. ext4-snapshot has exclude bitmap, so if we read all bitmaps via only one function, it would be much better. Any opinion? Yongqiang. > > This bug was introduced in commit 2ccb5fb9f1, which dates back to > January 2009 and 2.6.29. So this bug has been around for a *long* > time. (We've seen it for over a year, but rarely enough that it we > could never find a repro case so we could study it in controlled > circumstances.) > > Google-Bug-Id: 2828254 > Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx> > Cc: stable@xxxxxxxxxx > --- > fs/ext4/balloc.c | 12 ++++++------ > fs/ext4/ialloc.c | 12 ++++++------ > 2 files changed, 12 insertions(+), 12 deletions(-) > > diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c > index 12ccacd..4501aab 100644 > --- a/fs/ext4/balloc.c > +++ b/fs/ext4/balloc.c > @@ -372,7 +372,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group) > ext4_unlock_group(sb, block_group); > if (buffer_uptodate(bh)) { > /* > - * if not uninit if bh is uptodate, > + * if not uninit && bh is uptodate, > * bitmap is also uptodate > */ > set_bitmap_uptodate(bh); > @@ -380,13 +380,12 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group) > return bh; > } > /* > - * submit the buffer_head for read. We can > - * safely mark the bitmap as uptodate now. > - * We do it here so the bitmap uptodate bit > - * get set with buffer lock held. > + * submit the buffer_head for read. It's important that we > + * *not* mark the bitmap up to date until the read is > + * completed, since we check bitmap_update() above without > + * locking the buffer for speed reasons. > */ > trace_ext4_read_block_bitmap_load(sb, block_group); > - set_bitmap_uptodate(bh); > if (bh_submit_read(bh) < 0) { > put_bh(bh); > ext4_error(sb, "Cannot read block bitmap - " > @@ -394,6 +393,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group) > block_group, bitmap_blk); > return NULL; > } > + set_bitmap_uptodate(bh); > ext4_valid_block_bitmap(sb, desc, block_group, bh); > /* > * file system mounted not to panic on error, > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c > index 00beb4f..6fbae6d 100644 > --- a/fs/ext4/ialloc.c > +++ b/fs/ext4/ialloc.c > @@ -139,7 +139,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group) > > if (buffer_uptodate(bh)) { > /* > - * if not uninit if bh is uptodate, > + * if not uninit && bh is uptodate, > * bitmap is also uptodate > */ > set_bitmap_uptodate(bh); > @@ -147,13 +147,12 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group) > return bh; > } > /* > - * submit the buffer_head for read. We can > - * safely mark the bitmap as uptodate now. > - * We do it here so the bitmap uptodate bit > - * get set with buffer lock held. > + * submit the buffer_head for read. It's important that we > + * *not* mark the bitmap up to date until the read is > + * completed, since we check bitmap_update() above without > + * locking the buffer for speed reasons. > */ > trace_ext4_load_inode_bitmap(sb, block_group); > - set_bitmap_uptodate(bh); > if (bh_submit_read(bh) < 0) { > put_bh(bh); > ext4_error(sb, "Cannot read inode bitmap - " > @@ -161,6 +160,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group) > block_group, bitmap_blk); > return NULL; > } > + set_bitmap_uptodate(bh); > return bh; > } > > -- > 1.7.4.1.22.gec8e1.dirty > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Best Wishes Yongqiang Yang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html