Re: xfs: Temporary extra disk space consumption?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote:
> Hello.
> 
> I found that running a sample program shown below on xfs filesystem
> results in consuming extra disk space until close() is called.
> Is this expected result?

Yes. It's an anti-fragmentation mechanism that is intended to
prevent ecessive fragmentation when many files are being written at
once.

> I don't care if temporarily consumed extra disk space is trivial. But since
> this amount as of returning from fsync() is as much as amount of written data,
> I worry that there might be some bug.
> 
> ---------- my_write_unlink.c ----------
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> 
> int main(int argc, char *argv[])
> {
> 	static char buffer[1048576];
> 	const char *filename = "my_testfile";
> 	const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
> 	int i;

Truncate to zero length - all writes will be sequential extending
EOF.

> 
> 	if (fd == EOF)
> 		return 1;
> 	printf("Before write().\n");
> 	system("/bin/df -m .");
> 	for (i = 0; i < 1024; i++)
> 		if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer))
> 			return 1;

And then wrote 1GB of sequential data. Without looking yet at your
results, I would expect between about 1.5 and 2GB of space was
allocated.

> 	if (fsync(fd))
> 		return 1;

This will allocate it all as a single unwritten extent if possible,
then write the 1GB of data to it converting that range to written.

Check your file size here - it will be 1GB. You can't read beyond
EOF, so the extra allocation in not accesible. It's also unwritten,
so even if you could read beyond EOF, you can't read any data from
the range because reads of unwritten extents return zeros.

> 	printf("Before close().\n");
> 	system("/bin/df -m .");
> 	if (close(fd))
> 		return 1;

This will run ->release() which will remove any extra allocation
we do at write() and result in just the written data up to EOF
remaining allocated on disk.

> 	printf("Before unlink().\n");
> 	system("/bin/df -m .");
> 	if (unlink(filename))
> 		return 1;
> 	printf("After unlink().\n");
> 	system("/bin/df -m .");
> 	return 0;
> }
> ---------- my_write_unlink.c ----------
> 
> ----------
> $ uname -r
> 5.17.0
> $ ./my_write_unlink
> Before write().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 130392    125483  51% /
> Before close().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 132443    123432  52% /

Yup, 2GB of space allocated.

> Before unlink().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 131416    124459  52% /

and ->release trims extra allocation beyond EOF and now you are
back to just the 1GB the file consumes.

> After unlink().
> Filesystem     1M-blocks   Used Available Use% Mounted on
> /dev/sda1         255875 130392    125483  51% /

And now it's all gone.

> $ grep sda /proc/mounts
> /dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
> ----------
> 
> ----------
> $ uname -r
> 4.18.0-365.el8.x86_64

Same.

> ----------
> $ uname -r
> 3.10.0-1160.59.1.el7.x86_64

Same.

Looks like specualtive preallocation for sequential writes is
behaving exactly as designed....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux