On 2011-06-10, at 11:45 AM, Phillip Susi wrote: > On 6/10/2011 1:29 PM, Andreas Dilger wrote: >> On 2011-06-10, at 11:14 AM, Phillip Susi wrote: >>> On 6/10/2011 12:19 PM, Andreas Dilger wrote: >>>> I think in the presence of flex_bg this issue is moot. >>> >>> What is the issue without flex_bg? >> >> No "issue" really, just that the block/inode bitmaps are spread all over >> the filesystem. The original discussion was about whether there could be >> "larger bitmaps that addressed more than 32768 blocks", which is essentially >> what the flex_bg feature provides. With flex_bg the bitmaps for different >> groups will be allocated adjacent to each other on disk, and allow addressing >> more than 32768 blocks without any seeking. >> >> On large filesystems without flex_bg, the distribution of the bitmaps without >> flex_bg means that a seek is needed to read each one, and given that spinning >> disks have stayed at about 100 seeks/sec for decades it means 10+ minutes just >> to read all of the bitmaps. >> >> On my 2TB 5400 RPM SATA drive, e2fsck time went from ~20 minutes to ~3 minutes >> by copying the data to a new ext4 filesystem with flex_bg + extents. For a >> fair comparison, I then reformatted the original (identical) disk without >> flex_bg or extents and copied the data back, so that there wasn't any unfair >> comparison between the newly-formatted filesystem and the old fragmented one. > > I know what flex_bg is; what I don't understand is what it has to do with the limit on the size of a block group. Whether the block bitmaps are stored in their native block group, or clustered up with flex_bg does not seem to have anything to do with whether or not the size of the bitmap can exceed 32k blocks. I hope it is obvious that a single bitmap block can only address the number of bits (==blocks) that fit within that block. To address more blocks the block bitmap needs to be larger than a single block in size. One possible way to do this (discussed early on for ext4) would be to have N block bitmap blocks per group. That raises issues of how to address those blocks for each "block group", and what the meaning of a "block group" really is. The other (very similar, but not identical) approach is to essentially merge N adjacent "block groups" into a single "large block group" that has N block bitmaps, and addresses N * blocksize * 8 blocks per "large block group". In this case "N" is the flex_bg factor (constrained to 2^n), and the "large block group" is called a "flex group". It achieves exactly the same thing as having N block bitmaps per group, with the only difference that there are N group descriptors that point to the bitmaps, and they no longer have to be located within the groups themselves There is virtually no difference between "larger bitmap" and "flex_bg": "b"=block bitmap, "i"=inode bitmap, "."=data block Non-flex_bg configuration for 4 groups * 32768 blocks: bi...{32760}...bi...{32760}...bi...{32760}...bi...{32760}... Each block bitmap addresses 32768 blocks in total (including itself). flex_bg configuration for the same 4 groups * 32768 blocks: bbbbiiii.....................{131020}....................... If you treat the four "bbbb" blocks as a single block bitmap, and "iiii" as a single inode bitmap, and the contiguous range of free blocks as a single group, it is exactly what you are asking for - a larger bitmap. Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html