Re: [PATCH RESEND 4/8] ext4: add the gdt block of meta_bg to system_zone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



An extra 20-30MB of RAM for mounting a 1PB filesystem isn't
a huge deal. We already need 512MB for just the 8M group descriptors,
and we have a 1GB journal.

I haven't heard any specific performance issues with block_validity,
but it may be newer than the 3.10 kernels we are currently using on
our servers. 

Cheers, Andreas

> On Dec 15, 2020, at 13:13, Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> 
> You did your test on a 80T file system, but that's not where someone
> would be using meta_bg.  Meta_bg ges used for much larger file systems
> than that!  With meta_bg, we have 3 block group descriptors every 64
> block groups.  Each block group describes 128M of memory.  So for that
> means we are going to have 3 entries in the system zone tree for every_
> 8GB of file system space, 383,216 entries for every PB.  Given that
> each entry is 40 bytes, that means that the block_validity entries
> will consume 15 megabytes per PB.
> 
> Now, one third of these entries overlap with the flex_bg entries
> (meta_bg groups are in the first, second, and last block group of each
> meta_bg, where are 64 block groups in 4k file systems), and of course,
> the default flex_bg size of 16 block groups means that there are
> 524,288 entries per PB.  So if we include all backup sb and block
> groups, in a 1 PB file system, there will be roughly 786,432 entries
> in a 1 PB file system.  (I'm ignoring the entries for the backup
> superblocks, but that's only about 20 or so extra entries.)
> 
> So for a flex_bg 1PB file system, the amount of memory for a
> block_validity data structure is roughly 20M, and including all backup
> descriptors for meta_bg on a flex_bg + meta_bg setup is roughly 30M.
> 
> I agree with you that for a non-meta_bg file system, including all of
> the backup superblock and block group descriptors is not going to be
> large.  But while protecting the meta_bg group descriptors is
> worthwhile, protecting the backup meta_bg's is not free, and will
> increase the size of the tree by 33%.
> 
> I'm also wondering whether or not Lustre (where they do have some file
> systems that are in the PB range) have run into overhead issues with
> block_validity.
> 
> What do folks think?
> 
>                       - Ted




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux