On Thu, Aug 04, 2022 at 12:25:31PM +0200, Emmanouil Vamvakopoulos wrote: > hello Carlos and Dave > > thank you for the replies > > a) for the mismatch in alignment bewteen xfs and underlying raid volume I have to re-check > but from preliminary tests , when I mount the partition with a static allocsize ( e.g. allocsize=256k) > we have large file with large number of externs ( up to 40) but the sizes from du was comparable. As expected - fixing the post-EOF specualtive preallocation to 256kB means almost no consumed space beyond eof so they will always be close (but not identical) for a non-sparse, non-shared file. But that begs the question: why are you concerned about large files consuming slightly more space than expected for a short period of time? We've been doing this since commit 055388a3188f ("xfs: dynamic speculative EOF preallocation") which was committed in January 2011 - over a decade ago - and it's been well known for a couple of decades before that that ls and du cannot be relied to match on any filesystem that supports sparse files. And these days with deduplication/reflink that share extents betwen files, it's even less useful because du can be correct for every individual file, but then still report that more blocks are being used than the filesystem has capacity to store because it reports shared blocks multiple times... So why do you care that du and ls are different? > b) for the speculative preallocation beyond EOF of my files as I understood have to run xfs_fsr to get the space back. No, you don't need to do anything, and you *most definitely* do *not* want to run xfs_fsr to remove it. If you really must remove specualtive prealloc, then run: # xfs_spaceman -c "prealloc -m 0" <mntpt> And that will remove all specualtive preallocation that is current on all in-memory inodes via an immediate blockgc pass. If you just want to remove post-eof blocks on a single file, then find out the file size with stat and truncate it to the same size. The truncate won't change the file size, but it will remove all blocks beyond EOF. *However* You should not ever need to be doing this as there are several automated triggers to remove it, all when the filesytem detects there is no active modification of the file being performed. One trigger is the last close of a file descriptor, another is the periodic background blockgc worker, and another is memory reclaim removing the inode from memory. In all cases, these are triggers that indicate that the file is not currently being written to, and hence the speculative prealloc is not needed anymore and so can be removed. So you should never have to remove it manually. > but why the inodes of those files remains dirty at least for 300 sec after the closing of the file and lost the automatic removal of the preallocation ? What do you mean by "dirty"? A file with post-eof preallocation is not dirty in any way once the data in the file has been written back (usually within 30s). > we are runing on CentOS Stream release 8 with 4.18.0-383.el8.x86_64 > > but we never see something simliar on CentOS Linux release 7.9.2009 (Core) with 3.10.0-1160.45.1.el7.x86_64 > (for similar pattern of file sizes, but truly with different distributed strorage application) RHEL 7/CentOS 7 had this same behaviour - it was introduced in 2.6.38. All your observation means is that the application running on RHEL 7 was writing the files in a way that didn't trigger speculative prealloc beyond EOF, not that speculative prealloc beyond EOF didn't exist.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx