On 8/21/21 12:45 AM, Eric Whitney wrote: > * Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx>: >> When ext4_es_insert_delayed_block() returns error, e.g., ENOMEM, >> previously reserved space is not released as the error handling, >> in which case @s_dirtyclusters_counter is left over. Since this delayed >> extent failes to be inserted into extent status tree, when inode is >> written back, the extra @s_dirtyclusters_counter won't be subtracted and >> remains there forever. >> >> This can leads to /sys/fs/ext4/<dev>/delayed_allocation_blocks remains >> non-zero even when syncfs is executed on the filesystem. >> > > Hi: > > I think the fix below looks fine. However, this comment doesn't look right > to me. Are you really seeing delayed_allocation_blocks values that remain > incorrectly elevated across last closes (or across file system unmounts and > remounts)? s_dirtyclusters_counter isn't written out to stable storage - > it's an in-memory only variable that's created when a file is first opened > and destroyed on last close. > Actually we've encountered a real case in our production environment, which has about 20G space lost (df - du = ~20G). After some investigation, we've confirmed that it cause by leaked s_dirtyclusters_counter (~5M), and even we do manually sync, it remains. Since there is no error messages, we've checked all logic around s_dirtyclusters_counter and found this. Also we can manually inject error and reproduce the leaked s_dirtyclusters_counter. Thanks, Joseph