Re: [PATCH -V2 3/5] ext4: Fix the race between read_block_bitmap and mark_diskspace_used

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



looks even more strange, IMHO. do I understand correct that two processes
doing allocation in the same group can do two initializations? what if one
process just allocated block(s) and not cleared UNINIT bit yet?

thanks, Alex



Aneesh Kumar K.V wrote:
On Mon, Nov 24, 2008 at 09:17:53PM +0300, Alex Zhuravlev wrote:
Aneesh Kumar K.V wrote:
With commit c806e68f we do a init_bitmap every time we do a
read_block_bitmap.
can you explain why do we need to init it every time?


The commit message  explains it well. It is because the buffer_head
can be marked uptodate by a read from userspace. So we would skip doing
a init_bitmap on the uninit group during resize.

commit c806e68f5647109350ec546fee5b526962970fd2
Author: Frederic Bohe <frederic.bohe@xxxxxxxx>
Date:   Fri Oct 10 08:09:18 2008 -0400

    ext4: fix initialization of UNINIT bitmap blocks
This fixes a bug which caused on-line resizing of filesystems with a
    1k blocksize to fail.  The root cause of this bug was the fact that if
    an uninitalized bitmap block gets read in by userspace (which
    e2fsprogs does try to avoid, but can happen when the blocksize is less
    than the pagesize and an adjacent blocks is read into memory)
    ext4_read_block_bitmap() was erroneously depending on the buffer
    uptodate flag to decide whether it needed to initialize the bitmap
    block in memory --- i.e., to set the standard set of blocks in use by
    a block group (superblock, bitmaps, inode table, etc.).  Essentially,
    ext4_read_block_bitmap() assumed it was the only routine that might
    try to read a block containing a block bitmap, which is simply not
    true.
To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
    must always initialize uninitialized bitmap blocks.  Once a block or
    inode is allocated out of that bitmap, it will be marked as
    initialized in the block group descriptor, so in general this won't
    result any extra unnecessary work.
Signed-off-by: Frederic Bohe <frederic.bohe@xxxxxxxx>
    Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx>

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 59566c0..bd2ece2 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -319,9 +319,11 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
 			    block_group, bitmap_blk);
 		return NULL;
 	}
-	if (bh_uptodate_or_lock(bh))
+	if (buffer_uptodate(bh) &&
+	    !(desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))
 		return bh;
+ lock_buffer(bh);
 	spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
 	if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 		ext4_init_block_bitmap(sb, bh, block_group, desc);
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 1343bf1..fe34d74 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -115,9 +115,11 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
 			    block_group, bitmap_blk);
 		return NULL;
 	}
-	if (bh_uptodate_or_lock(bh))
+	if (buffer_uptodate(bh) &&
+	    !(desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)))
 		return bh;
+ lock_buffer(bh);
 	spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
 	if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
 		ext4_init_inode_bitmap(sb, bh, block_group, desc);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 335faee..b580714 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -782,9 +782,11 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
 		if (bh[i] == NULL)
 			goto out;
- if (bh_uptodate_or_lock(bh[i]))
+		if (buffer_uptodate(bh[i]) &&
+		    !(desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))
 			continue;
+ lock_buffer(bh[i]);
 		spin_lock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
 		if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 			ext4_init_block_bitmap(sb, bh[i],

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux