Resize of `-O bigalloc` filesystem uses a computation with O(n**2) complexity

"Kiselev, Oleg" <okiselev@xxxxxxxxxx> · Mon, 28 Mar 2022 19:06:17 +0000

We are validating the use of `-O bigalloc` functionality and have run into problems with it.

This issue requires a large device to reproduce.  On very large filesystems created with `-O bigalloc -C 16k` we see resize take progressively more time as the fs size grows.  Here’s the time it takes to resize a filesystem created with the default 4K blocks:
===========
Size of logical volume dbdata01/lvdbdata01 changed from 37.79 TiB (9906476 extents) to 38.80 TiB (10171493 extents)

real     0m13.604s
user     0m0.013s
sys      0m9.549s
============ 
vs. the time it takes with 16K clusters:
============
Size of logical volume dbdata01/lvdbdata01 changed from 36.98 TiB (9694245 extents) to <38.71 TiB (10147477 extents).

real    178m22.590s
user    0m0.005s
sys     178m4.027s
============ 
We have tracked this down to the code in fs/ext4/super.c which does O(number_of_block_groups ** 2) iterations through the allocation groups while computing metadata overhead.  The extraction of the code that shows this
```
3926 int ext4_calculate_overhead(struct super_block *sb)
3927 {
         [...]
3953     for (i = 0; i < ngroups; i++) {
             [...]
3956         blks = count_overhead(sb, i, buf);      <<<<<
             [...]
3961     }
         [...]
3984 } 
[...]
3865 static int count_overhead(struct super_block *sb, ext4_group_t grp,
3856          char *buf)
3857 {
         [...]
3874     if (!ext4_has_feature_bigalloc(sb))         <<<<< non-bigalloc exits here!
3875         return (ext4_bg_has_super(sb, grp) + ext4_bg_num_gdb(sb, grp) +
3876             sbi->s_itb_per_group + 2);
         [...]
3881     for (i = 0; i < ngroups; i++) {             <<<<<
             [...]
3916     }
        [...]
3921 }
``` 
The comment on `count_overhead()` clearly states that the committers who wrote this code knew that this will be an n**2 complexity computation.

We are looking for ways to reduce the computation time, perhaps by skipping the already existing groups, whose overhead is already computed, and which are unlikely to have any of their metadata allocations in the groups we are adding.