On Tue, Apr 5, 2011 at 11:10, Andreas Dilger <adilger@xxxxxxxxx> wrote: > On 2011-04-04, at 9:11 AM, Eric Sandeen wrote: >> Block devices may set minimum or optimal IO hints equal to >> blocksize; in this case there is really nothing for ext4 >> to do with this information (i.e. search for a block-aligned >> allocation?) so don't set fs geometry with single-block >> values. >> >> Zeev also reported that with a block-sized stripe, the >> ext4 allocator spends time spinning in ext4_mb_scan_aligned(), >> oddly enough. >> >> Reported-by: Zeev Tarantov <zeev.tarantov@xxxxxxxxx> >> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx> >> --- >> >> diff --git a/misc/mke2fs.c b/misc/mke2fs.c >> index 9798b88..74b838c 100644 >> --- a/misc/mke2fs.c >> +++ b/misc/mke2fs.c >> @@ -1135,8 +1135,11 @@ static int get_device_geometry(const char *file, >> if ((opt_io == 0) && (psector_size > blocksize)) >> opt_io = psector_size; >> >> - fs_param->s_raid_stride = min_io / blocksize; >> - fs_param->s_raid_stripe_width = opt_io / blocksize; >> + /* setting stripe/stride to blocksize is pointless */ >> + if (min_io > blocksize) >> + fs_param->s_raid_stride = min_io / blocksize; >> + if (opt_io > blocksize) >> + fs_param->s_raid_stripe_width = opt_io / blocksize; > > I don't think it is harmful to specify an mballoc alignment that is an even multiple of the underlying device IO size (e.g. at least 256kB or 512kB). > > If the underlying device (e.g. zram) is reporting 16kB or 64kB opt_io size because that is PAGE_SIZE, but blocksize is 4kB, then we will have the same performance problem again. I think I'm not understanding you. Are you objecting to the patch? If so, why? If io block is an even multiple of fs blocks, then mballoc really does need to use the fancier aligning code (though for a ratio that is a power of two, it shouldn't need to be slow). In your example, If PAGE_SIZE is 16k and zram reports min and opt io size as 16k but fs block is 4k, then Eric Sandeen's new code will set stride and stripe to 4, as it should. Then mballoc will really need to align blocks in groups of 4 and spending time doing that is not a performance problem. This patch is only meant to _not_ set stripe and stride to 1, which cannot help block allocation and does in fact cause the kernel to spend more cpu time because it does not detect the degenerate case. Making the allocator faster in the case of a ratio that is a power of two is the work of a different patch someone may want to write. Using larger fs blocks in case the underlying block device prefers larger io blocks is the work of the admin, I think. > Cheers, Andreas -Z.T. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html