On Thu, Sep 26, 2019 at 12:57:49PM +0200, Max Reitz wrote: > Hi, > > I’ve noticed that fallocating some range on XFS sometimes does not > include the last block covered by the range, when the start offset is > unaligned. > > (Tested on 5.3.0-gf41def397.) > > This happens whenever ceil((offset + len) / block_size) - floor(offset / > block_size) > ceil(len / block_size), for example: > > Let block_size be 4096. Then (on XFS): > > $ fallocate -o 2048 -l 4096 foo # Range [2048, 6144) > $ xfs_bmap foo > foo: > 0: [0..7]: 80..87 > 1: [8..15]: hole > > There should not be a hole there. Both of the first two blocks should > be allocated. XFS will do that if I just let the range start one byte > sooner and increase the length by one byte: > > $ rm -f foo > $ fallocate -o 2047 -l 4097 foo # Range [2047, 6144) > $ xfs_bmap foo > foo: > 0: [0..15]: 88..103 > > > (See [1] for a more extensive reasoning why this is a bug.) > > > The problem is (as far as I can see) that xfs_alloc_file_space() rounds > count (which equals len) independently of the offset. So in the > examples above, 4096 is rounded to one block and 4097 is rounded to two; > even though the first example actually touches two blocks because of the > misaligned offset. > > Therefore, this should fix the problem (and does fix it for me): > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > index 0910cb75b..4f4437030 100644 > --- a/fs/xfs/xfs_bmap_util.c > +++ b/fs/xfs/xfs_bmap_util.c > @@ -864,6 +864,7 @@ xfs_alloc_file_space( > xfs_filblks_t allocatesize_fsb; > xfs_extlen_t extsz, temp; > xfs_fileoff_t startoffset_fsb; > + xfs_fileoff_t endoffset_fsb; > int nimaps; > int quota_flag; > int rt; > @@ -891,7 +892,8 @@ xfs_alloc_file_space( > imapp = &imaps[0]; > nimaps = 1; > startoffset_fsb = XFS_B_TO_FSBT(mp, offset); > - allocatesize_fsb = XFS_B_TO_FSB(mp, count); > + endoffset_fsb = XFS_B_TO_FSB(mp, offset + count); > + allocatesize_fsb = endoffset_fsb - startoffset_fsb; > > /* > * Allocate file space until done or until there is an error > That looks like a reasonable fix to me and it's in the spirit of how xfs_free_file_space() works as well (outside of the obvious difference in how unaligned boundary blocks are handled). Care to send a proper patch? Brian > > Thanks and kind regards, > > Max > > > [1] That this is a bug can be proven as follows: > > 1. The fallocate(2) man page states "subsequent writes into the range > specified by offset and len are guaranteed not to fail because of lack > of disk space." > > 2. Run this test (anywhere, e.g. tmpfs): > > $ truncate -s $((4096 * 4096)) test_fs > $ mkfs.xfs -b size=4096 test_fs > [Success-indicating output, I hope] > > $ mkdir mount_point > $ sudo mount -o loop test_fs mount_point > $ sudo chmod go+rwx mount_point > $ cd mount_point > > $ free_blocks=$(df -B4k . | tail -n 1 \ > | awk '{ split($0, f); print f[4] }') > > $ falloc_length=$((free_blocks * 4096)) > > $ while true; do \ > fallocate -o 2048 -l $falloc_length test_file && break; \ > falloc_length=$((falloc_length - 4096)); \ > done > fallocate: fallocate failed: No space left on device > fallocate: fallocate failed: No space left on device > fallocate: fallocate failed: No space left on device > fallocate: fallocate failed: No space left on device > > # Now we have a test_file with an fallocated range of > # [2048, 2048 + $falloc_length) > # So we should be able to write anywhere in that area without > # encountering ENOSPC; but that is what happens when we write > # to the last block covered by the range: > > $ dd if=/dev/zero of=test_file bs=1 conv=notrunc \ > seek=$falloc_length count=2048 > dd: error writing 'test_file': No space left on device > 1+0 records in > 0+0 records out > 0 bytes copied, 0.000164691 s, 0.0 kB/s > > > When I apply the diff shown above, I get one more “No space left on > device” line (indicating that fallocate consistently takes one > additional block), and then: > > $ uname -sr > Linux 5.3.0-gf41def397-dirty > > $ dd if=/dev/zero of=test_file bs=1 conv=notrunc \ > seek=$falloc_length count=2048 > 2048+0 records in > 2048+0 records out > 2048 bytes (2.0 kB, 2.0 KiB) copied, 0.0121903 s, 168 kB/s > > (i.e., what I’d expect)