On Wed, Mar 23, 2022 at 08:21:52PM +0900, Tetsuo Handa wrote: > Hello. > > I found that running a sample program shown below on xfs filesystem > results in consuming extra disk space until close() is called. > Is this expected result? Yes. It's an anti-fragmentation mechanism that is intended to prevent ecessive fragmentation when many files are being written at once. > I don't care if temporarily consumed extra disk space is trivial. But since > this amount as of returning from fsync() is as much as amount of written data, > I worry that there might be some bug. > > ---------- my_write_unlink.c ---------- > #include <stdio.h> > #include <stdlib.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <unistd.h> > > int main(int argc, char *argv[]) > { > static char buffer[1048576]; > const char *filename = "my_testfile"; > const int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600); > int i; Truncate to zero length - all writes will be sequential extending EOF. > > if (fd == EOF) > return 1; > printf("Before write().\n"); > system("/bin/df -m ."); > for (i = 0; i < 1024; i++) > if (write(fd, buffer, sizeof(buffer)) != sizeof(buffer)) > return 1; And then wrote 1GB of sequential data. Without looking yet at your results, I would expect between about 1.5 and 2GB of space was allocated. > if (fsync(fd)) > return 1; This will allocate it all as a single unwritten extent if possible, then write the 1GB of data to it converting that range to written. Check your file size here - it will be 1GB. You can't read beyond EOF, so the extra allocation in not accesible. It's also unwritten, so even if you could read beyond EOF, you can't read any data from the range because reads of unwritten extents return zeros. > printf("Before close().\n"); > system("/bin/df -m ."); > if (close(fd)) > return 1; This will run ->release() which will remove any extra allocation we do at write() and result in just the written data up to EOF remaining allocated on disk. > printf("Before unlink().\n"); > system("/bin/df -m ."); > if (unlink(filename)) > return 1; > printf("After unlink().\n"); > system("/bin/df -m ."); > return 0; > } > ---------- my_write_unlink.c ---------- > > ---------- > $ uname -r > 5.17.0 > $ ./my_write_unlink > Before write(). > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/sda1 255875 130392 125483 51% / > Before close(). > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/sda1 255875 132443 123432 52% / Yup, 2GB of space allocated. > Before unlink(). > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/sda1 255875 131416 124459 52% / and ->release trims extra allocation beyond EOF and now you are back to just the 1GB the file consumes. > After unlink(). > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/sda1 255875 130392 125483 51% / And now it's all gone. > $ grep sda /proc/mounts > /dev/sda1 / xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0 > ---------- > > ---------- > $ uname -r > 4.18.0-365.el8.x86_64 Same. > ---------- > $ uname -r > 3.10.0-1160.59.1.el7.x86_64 Same. Looks like specualtive preallocation for sequential writes is behaving exactly as designed.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx