Hi Andreas: On Thu, Apr 23, 2009 at 12:08 PM, Andreas Dilger <adilger@xxxxxxx> wrote: > On Apr 23, 2009 09:41 -0700, Curt Wohlgemuth wrote: >> I'm seeing a performance problem on ext4 vs ext2, and in trying to >> narrow it down, I've got a question about block allocation in ext4 >> that I'm having trouble figuring out. >> >> Using dd, I created (in this order) two 4GB files and a 10GB file in >> the mount directory. >> >> The extent blocks are reasonably close together for the two 4GB files, >> but the extents for the 10GB file show a huge gap, which seems to hurt >> the random read performance pretty substantially. Here's the output >> from debugfs: >> >> BLOCKS: >> (IND):8396832, (0-106495):8282112-8388607, >> (106496-399359):11241472-11534335, (399360-888831):20482048-20971519, >> (888832-1116159):23889920-24117247, (1116160-1277951):71665664- >> 71827455, (1277952-1767423):78678016-79167487, >> (1767424-2125823):102402048-102760447, >> (2125824-2148351):102768672-102791199, >> (2148352-2621439):102793216-103266303 >> TOTAL: 2621441 >> >> Note the gap between blocks 79167487 and 102402048. > > Well, there are other even larger gaps for other chunks of the file. Really? Not that it's important, but I'm not seeing them... >> I was lucky enough to capture the mb_history from this 10GB create: >> >> 29109 14 735/30720/32758@1114112 735/30720/2048@1114112 >> 735/30720/2048@1114112 1 0 0 1568 M 0 0 >> 29109 14 736/0/32758@1116160 736/0/2048@1116160 >> 2187/2048/2048@1116160 1 1 0 1568 0 0 >> 29109 14 2187/4096/32758@1118208 2187/4096/2048@1118208 >> 2187/4096/2048@1118208 1 0 0 1568 M 2048 4096 >> >> I've been staring at ext4_mb_regular_allocator() trying to understand >> why an allocation with a goal block of 736 ends up with a best found >> extent group of 2187, and I'm stuck -- at least without a lot of >> printk messages. It seems to me that we just cycle through the block >> groups starting with the goal group until we find a group that fits. >> Again, according to dumpe2fs, block groups 737, 738, 739, ... all have >> 32768 free blocks. So why we end up with a best fit group of 2187 is >> a mystery to me. > > This is likely the "uninit_bg" feature that is causing the allocations > to skip groups which are marked BLOCK_UNINIT. In some sense the benefit > of skipping the block bitmap read during e2fsck is probably not at all > beneficial compared to the cost of the extra seeking during IO. As the > filesystem gets more full, the BLOCK_UNIIT flags would be cleared anyways, > so we might as well just keep the early allocations contiguous. Ah, thanks! That's what I was missing. Yes, I sort of skipped over the "is this a good group?" question. > A simple change to verify this would be something like the following, > but it hasn't actually been tested. Tell you what: I'll try this out and see if it helps out my test case. Thanks, Curt > > --- ./fs/ext4/mballoc.c.uninit 2009-04-08 19:13:13.000000000 -0600 > +++ ./fs/ext4/mballoc.c 2009-04-23 13:02:22.000000000 -0600 > @@ -1742,10 +1723,6 @@ static int ext4_mb_good_group(struct ext > switch (cr) { > case 0: > BUG_ON(ac->ac_2order == 0); > - /* If this group is uninitialized, skip it initially */ > - desc = ext4_get_group_desc(ac->ac_sb, group, NULL); > - if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) > - return 0; > > bits = ac->ac_sb->s_blocksize_bits + 1; > for (i = ac->ac_2order; i <= bits; i++) > @@ -2039,9 +2035,7 @@ repeat: > ac->ac_groups_scanned++; > desc = ext4_get_group_desc(sb, group, NULL); > - if (cr == 0 || (desc->bg_flags & > - cpu_to_le16(EXT4_BG_BLOCK_UNINIT) && > - ac->ac_2order != 0)) > + if (cr == 0) > ext4_mb_simple_scan_group(ac, &e4b); > else if (cr == 1 && > ac->ac_g_ex.fe_len == sbi->s_stripe) > > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html