Hi Alex, A couple of comments. First, please separate this patch so that these two separate pieces of functionality can be reviewed and tested separately: > 1) mballoc tries too hard to find the best chunk which is > counterproductive - it makes sense to limit this process > 2) during scanning the bitmaps are loaded one by one, synchronously > - it makes sense to prefetch few groups at once As far the prefetch is concerned, please note that the bitmap is first read into the buffer cache via read_block_bitmap_nowait(), but then it needs to be copied into buddy bitmap pages where it is cached along side the buddy bitmap. (The copy in the buddy bitmap is a combination of the on-disk block allocation bitmap plus any outstanding preallocations.) From that copy of block bitmap, we then generate the buddy bitmap and as a side effect, initialize the statistics (grp->bb_first_free, grp->bb_largest_free_order, grp->bb_counters[]). It is these statistics that we need to be able to make allocation decisions for a particular block group. So perhaps we should drive the readahead of the bitmaps from ext4_mb_init_group() / ext4_mb_init_cache(), and make sure that we actually initialize the ext4_group_info structure, and not just read the bitmap into buffer cache and hope it gets used before memory pressure pushes it out of the buddy cache. Andreas has suggested going even farther, and perhaps storing this derived information from the allocation bitmaps someplace convenient on disk. This is an on-disk format change, so we would want to think very carefully before going down that path. Especially since if we're going to go this far, perhaps we should consider using an on-disk b-tree to store the allocation information, which could be more efficient than using allocation bitmaps plus buddy bitmaps. Cheers, - Ted