On Thu, Sep 26, 2024 at 10:50 AM Baokun Li <libaokun1@xxxxxxxxxx> wrote: > > On 2024/9/26 0:17, Aleksandr Mikhalitsyn wrote: > > On Wed, Sep 25, 2024 at 5:57 PM Jan Kara <jack@xxxxxxx> wrote: > >> On Wed 25-09-24 16:33:24, Alexander Mikhalitsyn wrote: > >>> [ 33.882936] EXT4-fs (dm-5): mounted filesystem 8aaf41b2-6ac0-4fa8-b92b-77d10e1d16ca r/w with ordered data mode. Quota mode: none. > >>> [ 33.888365] EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks > >>> [ 33.888740] ------------[ cut here ]------------ > >>> [ 33.888742] kernel BUG at fs/ext4/resize.c:324! > >> Ah, I was staring at this for a while before I understood what's going on > >> (it would be great to explain this in the changelog BTW). As far as I > >> understand commit 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation > >> in alloc_flex_gd()") can actually make flex_gd->resize_bg larger than > >> flexbg_size (for example when ogroup = flexbg_size, ngroup = 2*flexbg_size > >> - 1) which then confuses things. I think that was not really intended and > > Hi Jan, > > > > First of all, thanks for your reaction/review on this one ;-) > > > > You are absolutely right, have just checked with our reproducer and > > this modification: > > > > diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c > > index e04eb08b9060..530a918f0cab 100644 > > --- a/fs/ext4/resize.c > > +++ b/fs/ext4/resize.c > > @@ -258,6 +258,8 @@ static struct ext4_new_flex_group_data > > *alloc_flex_gd(unsigned int flexbg_size, > > flex_gd->resize_bg = 1 << max(fls(last_group - o_group + 1), > > fls(n_group - last_group)); > > > > + BUG_ON(flex_gd->resize_bg > flexbg_size); > > + > > flex_gd->groups = kmalloc_array(flex_gd->resize_bg, > > sizeof(struct ext4_new_group_data), > > GFP_NOFS); > > > > and yes, it crashes on this BUG_ON. So it looks like instead of making > > flex_gd->resize_bg to be smaller > > than flexbg_size in most cases we can actually have an opposite effect > > here. I guess we really need to fix alloc_flex_gd() too. > > > >> instead of fixing up ext4_alloc_group_tables() we should really change > >> the logic in alloc_flex_gd() to make sure flex_gd->resize_bg never exceeds > >> flexbg size. Baokun? > > At the same time, if I understand the code right, as we can have > > flex_gd->resize_bg != flexbg_size after > > 5d1935ac02ca5a ("ext4: avoid online resizing failures due to oversized > > flex bg") and > > 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") > > we should always refer to flex_gd->resize_bg value which means that > > ext4_alloc_group_tables() fix is needed too. > > Am I correct in my understanding? > > Hi Alex, Hi Baokun, > > These two are not exactly equivalent. > > The flex_gd->resize_bg is only used to determine how many block groups we > allocate memory to, i.e., the maximum number of block groups per resize. > And the flexbg_size is used to make some judgement on flexible block > groups, for example, the BUG_ON triggered in the issue is to make sure > src_group and last_group must be in the same flexible block group. Huge thanks for explaining this! Then I guess it's better if you send a patch with your fix. Feel free to add my Tested-by tag. Question to you and Jan. Do you guys think that it makes sense to try to create a minimal reproducer for this problem without Incus/LXD involved? (only e2fsprogs, lvm tools, etc) I guess this test can be put in the xfstests test suite, right? Kind regards, Alex > > > Regards, > Baokun >