Ah, good catch. We don't notice this on Lustre, since we always use at most 1MB writes from the network and always configure with 1MB stripe size. On 2010-07-14, at 15:10, Eric Sandeen wrote: > For some reason, today mballoc only allocates IOs which are exactly > stripe-sized on a stripe boundary. If you have a multiple (say, a > 128k IO on a 64k stripe) you may end up unaligned. > > It seems to me that a simple change to align stripe-multiple IOs > on stripe boundaries would be a very good idea, unless this breaks > some other mballoc heuristic for some reason... > > Reported-by: Mike Snitzer <snitzer@xxxxxxxxxx> > Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx> > --- > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 12b3bc0..f64a439 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -1821,8 +1821,7 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac, > > /* > * This is a special case for storages like raid5 > - * we try to find stripe-aligned chunks for stripe-size requests > - * XXX should do so at least for multiples of stripe size as well > + * we try to find stripe-aligned chunks for stripe-size-multiple requests > */ > static noinline_for_stack > void ext4_mb_scan_aligned(struct ext4_allocation_context *ac, > @@ -2094,8 +2093,8 @@ repeat: > ac->ac_groups_scanned++; > if (cr == 0) > ext4_mb_simple_scan_group(ac, &e4b); > - else if (cr == 1 && > - ac->ac_g_ex.fe_len == sbi->s_stripe) > + else if (cr == 1 && sbi->s_stripe && > + !(ac->ac_g_ex.fe_len % sbi->s_stripe)) > ext4_mb_scan_aligned(ac, &e4b); > else > ext4_mb_complex_scan_group(ac, &e4b); > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html