On Fri, Aug 20, 2021 at 12:45:56PM -0400, Eric Whitney wrote: > * Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx>: > > When ext4_es_insert_delayed_block() returns error, e.g., ENOMEM, > > previously reserved space is not released as the error handling, > > in which case @s_dirtyclusters_counter is left over. Since this delayed > > extent failes to be inserted into extent status tree, when inode is > > written back, the extra @s_dirtyclusters_counter won't be subtracted and > > remains there forever. > > > > This can leads to /sys/fs/ext4/<dev>/delayed_allocation_blocks remains > > non-zero even when syncfs is executed on the filesystem. > > > > Hi: > > I think the fix below looks fine. However, this comment doesn't look right > to me. Are you really seeing delayed_allocation_blocks values that remain > incorrectly elevated across last closes (or across file system unmounts and > remounts)? s_dirtyclusters_counter isn't written out to stable storage - > it's an in-memory only variable that's created when a file is first opened > and destroyed on last close. hmmm.... Let me explain a bit about this. It can be reproduced easily by fault injection with the code modified below: diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index 9a3a8996aacf..29dc0da5960c 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -794,6 +794,9 @@ static int __es_insert_extent(struct inode *inode, struct extent_status *newes) } } + if (!(ktime_get_ns() % 3)) { + return -ENOMEM; + } es = ext4_es_alloc_extent(inode, newes->es_lblk, newes->es_len, newes->es_pblk); if (!es) and then run a loop while true; do dd if=/dev/zero of=aaa bs=8192 count=10000; sync; rm -rf aaa; done After "Cannot allocate memory reported" is shown, s_dirtyclusters_counter was already leaked. It can cause df and free space counting incorrect in this mount. If my understanging is correct, in priciple, we should also check with "WARN_ON(ei->i_reserved_data_blocks)" in the inode evict path since it should be considered as 0. Thanks, Gao Xiang > > Eric > > > Fixes: 51865fda28e5 ("ext4: let ext4 maintain extent status tree") > > Cc: <stable@xxxxxxxxxxxxxxx> > > Reported-by: Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> > > Signed-off-by: Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx> > > --- > > fs/ext4/inode.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 82087657860b..7f15da370281 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -1650,6 +1650,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk) > > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); > > int ret; > > bool allocated = false; > > + bool reserved = false; > > > > /* > > * If the cluster containing lblk is shared with a delayed, > > @@ -1666,6 +1667,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk) > > ret = ext4_da_reserve_space(inode); > > if (ret != 0) /* ENOSPC */ > > goto errout; > > + reserved = true; > > } else { /* bigalloc */ > > if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) { > > if (!ext4_es_scan_clu(inode, > > @@ -1678,6 +1680,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk) > > ret = ext4_da_reserve_space(inode); > > if (ret != 0) /* ENOSPC */ > > goto errout; > > + reserved = true; > > } else { > > allocated = true; > > } > > @@ -1688,6 +1691,8 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk) > > } > > > > ret = ext4_es_insert_delayed_block(inode, lblk, allocated); > > + if (ret && reserved) > > + ext4_da_release_space(inode, 1); > > > > errout: > > return ret; > > -- > > 2.27.0 > >