Re: s_bmap and flags explanation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 04, 2022 at 12:25:31PM +0200, Emmanouil Vamvakopoulos wrote:
> hello Carlos and Dave 
> 
> thank you for the replies
> 
> a) for the mismatch in alignment bewteen xfs  and underlying raid volume I have to re-check 
> but from preliminary tests , when I mount the partition with a static allocsize ( e.g. allocsize=256k)
> we have large file with large number of externs ( up to 40) but the sizes from du was comparable.

As expected - fixing the post-EOF specualtive preallocation to
256kB means almost no consumed space beyond eof so they will always
be close (but not identical) for a non-sparse, non-shared file.

But that begs the question: why are you concerned about large files
consuming slightly more space than expected for a short period of
time?

We've been doing this since commit 055388a3188f ("xfs: dynamic
speculative EOF preallocation") which was committed in January 2011
- over a decade ago - and it's been well known for a couple of
decades before that that ls and du cannot be
relied to match on any filesystem that supports sparse files.

And these days with deduplication/reflink that share extents betwen
files, it's even less useful because du can be correct for every
individual file, but then still report that more blocks are being
used than the filesystem has capacity to store because it reports
shared blocks multiple times...

So why do you care that du and ls are different?

> b) for the speculative preallocation beyond EOF of my files as I understood have to run xfs_fsr to get the space back. 

No, you don't need to do anything, and you *most definitely* do
*not* want to run xfs_fsr to remove it. If you really must remove
specualtive prealloc, then run:

# xfs_spaceman -c "prealloc -m 0" <mntpt>

And that will remove all specualtive preallocation that is current
on all in-memory inodes via an immediate blockgc pass.

If you just want to remove post-eof blocks on a single file, then
find out the file size with stat and truncate it to the same size.
The truncate won't change the file size, but it will remove all
blocks beyond EOF.

*However*

You should not ever need to be doing this as there are several
automated triggers to remove it, all when the filesytem detects
there is no active modification of the file being performed. One
trigger is the last close of a file descriptor, another is the
periodic background blockgc worker, and another is memory reclaim
removing the inode from memory.

In all cases, these are triggers that indicate that the file is not
currently being written to, and hence the speculative prealloc is
not needed anymore and so can be removed.

So you should never have to remove it manually.

> but why the inodes of those files remains dirty  at least for 300 sec  after the  closing of the file and lost the automatic removal of the preallocation ?

What do you mean by "dirty"? A file with post-eof preallocation is
not dirty in any way once the data in the file has been written
back (usually within 30s).

> we are runing on CentOS Stream release 8 with 4.18.0-383.el8.x86_64 
> 
> but we never see something simliar on CentOS Linux release 7.9.2009 (Core) with  3.10.0-1160.45.1.el7.x86_64 
> (for similar pattern of file sizes, but truly with different distributed strorage application)

RHEL 7/CentOS 7 had this same behaviour - it was introduced in
2.6.38. All your observation means is that the application running
on RHEL 7 was writing the files in a way that didn't trigger
speculative prealloc beyond EOF, not that speculative prealloc
beyond EOF didn't exist....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux