On Tue, Nov 29, 2022 at 08:07:50PM +0530, Ojaswin Mujoo wrote: > On Mon, Nov 28, 2022 at 10:06:34PM -0500, Theodore Ts'o wrote: > > Looking at the stack trace it looks like we're hitting this BUG_ON: > > > > spin_lock(&tmp_pa->pa_lock); > > if (tmp_pa->pa_deleted == 0) > > BUG_ON(!(start >= tmp_pa_end || end <= tmp_pa_start)); > > spin_unlock(&tmp_pa->pa_lock); > > > > ... in the inline function ext4_mb_pa_assert_overlap(), called from > > ext4_mb_pa_adjust_overlap(). > > > > The generic/269 test runs fstress with an ENOSPC hitter as an > > antogonist process. The ext3 configuration disables delayed > > allocation, which means that fstress is going to be allocating blocks > > at write time (instead of dirty page writeback time). > > > > Could you take a look? Thanks! > Hi Ted, > > Thanks for pointing this out, I'll have a look into this. > > PS: I'm on vacation so might be a bit slow to update on this. > > Regards, > Ojaswin > > > > - Ted Hi Ted, Apologies for the delay on this due to new years/replication issues. I was not able to replicate it at my end for some time before ultimately replicating this in gce-xfstests. I have sent a v3 [1] that fixes the issue by introducing an optimization related to delted PAs found in rbtree traversal in functions like ext4_mb_adjust_overlap() and ext4_mb_use_preallocated(). Let me quote the commit message which explains the issue and the fix we proposed: ---- snip ---- In ext4_mb_adjust_overlap(), it is possible for an allocating thread to come across a PA that is marked deleted via ext4_mb_discard_group_preallocations() but not yet removed from the inode PA rbtree. In such a case, we ignore any overlaps with this PA node and simply move forward in the rbtree by comparing logical start of to-be-inserted PA and the deleted PA node. Although this usually doesn't cause an issue and we can move forward in the tree, in one special case, i.e if range of deleted PA lies completely inside the normalized range, we might require to travese both the sub-trees under such a deleted PA. To simplify this special scenario and also as an optimization, undelete the PA If the allocating thread determines that this PA might be needed in the near future. This results in the following changes: - ext4_mb_use_preallocated(): Use a deleted PA if original request lies in it. - ext4_mb_pa_adjust_overlap(): Undelete a PA if it is deleted but there is an overlap with the normalized range. - ext4_mb_discard_group_preallocations(): Rollback delete operation if allocation path undeletes a PA before it is erased from inode PA list. Since this covers the special case we discussed above, we will always un-delete the PA when we encounter the special case and we can then adjust for overlap and traverse the PA rbtree without any issues. ---- snip ---- The above optimization is now straight forward as we completely lock the rbtree during traversals and modifications. Earlier in case of list, the locking would have been tricky due to usage of RCU. [1] https://lore.kernel.org/linux-ext4/20230116080216.249195-1-ojaswin@xxxxxxxxxxxxx/