Re: xfs_alloc_file_space() rounds len independently of offset

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 26 Sep 2019 08:59:28 -0400

On Thu, Sep 26, 2019 at 12:57:49PM +0200, Max Reitz wrote:
> Hi,
> 
> I’ve noticed that fallocating some range on XFS sometimes does not
> include the last block covered by the range, when the start offset is
> unaligned.
> 
> (Tested on 5.3.0-gf41def397.)
> 
> This happens whenever ceil((offset + len) / block_size) - floor(offset /
> block_size) > ceil(len / block_size), for example:
> 
> Let block_size be 4096.  Then (on XFS):
> 
> $ fallocate -o 2048 -l 4096 foo   # Range [2048, 6144)
> $ xfs_bmap foo
> foo:
>         0: [0..7]: 80..87
>         1: [8..15]: hole
> 
> There should not be a hole there.  Both of the first two blocks should
> be allocated.  XFS will do that if I just let the range start one byte
> sooner and increase the length by one byte:
> 
> $ rm -f foo
> $ fallocate -o 2047 -l 4097 foo   # Range [2047, 6144)
> $ xfs_bmap foo
> foo:
>         0: [0..15]: 88..103
> 
> 
> (See [1] for a more extensive reasoning why this is a bug.)
> 
> 
> The problem is (as far as I can see) that xfs_alloc_file_space() rounds
> count (which equals len) independently of the offset.  So in the
> examples above, 4096 is rounded to one block and 4097 is rounded to two;
> even though the first example actually touches two blocks because of the
> misaligned offset.
> 
> Therefore, this should fix the problem (and does fix it for me):
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 0910cb75b..4f4437030 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -864,6 +864,7 @@ xfs_alloc_file_space(
>  	xfs_filblks_t		allocatesize_fsb;
>  	xfs_extlen_t		extsz, temp;
>  	xfs_fileoff_t		startoffset_fsb;
> +	xfs_fileoff_t		endoffset_fsb;
>  	int			nimaps;
>  	int			quota_flag;
>  	int			rt;
> @@ -891,7 +892,8 @@ xfs_alloc_file_space(
>  	imapp = &imaps[0];
>  	nimaps = 1;
>  	startoffset_fsb	= XFS_B_TO_FSBT(mp, offset);
> -	allocatesize_fsb = XFS_B_TO_FSB(mp, count);
> +	endoffset_fsb = XFS_B_TO_FSB(mp, offset + count);
> +	allocatesize_fsb = endoffset_fsb - startoffset_fsb;
> 
>  	/*
>  	 * Allocate file space until done or until there is an error
> 

That looks like a reasonable fix to me and it's in the spirit of how
xfs_free_file_space() works as well (outside of the obvious difference
in how unaligned boundary blocks are handled). Care to send a proper
patch?

Brian

> 
> Thanks and kind regards,
> 
> Max
> 
> 
> [1] That this is a bug can be proven as follows:
> 
> 1. The fallocate(2) man page states "subsequent writes into the range
> specified by offset and len are guaranteed not to fail because of lack
> of disk space."
> 
> 2. Run this test (anywhere, e.g. tmpfs):
> 
> $ truncate -s $((4096 * 4096)) test_fs
> $ mkfs.xfs -b size=4096 test_fs
> [Success-indicating output, I hope]
> 
> $ mkdir mount_point
> $ sudo mount -o loop test_fs mount_point
> $ sudo chmod go+rwx mount_point
> $ cd mount_point
> 
> $ free_blocks=$(df -B4k . | tail -n 1 \
>       | awk '{ split($0, f); print f[4] }')
> 
> $ falloc_length=$((free_blocks * 4096))
> 
> $ while true; do \
>      fallocate -o 2048 -l $falloc_length test_file && break; \
>      falloc_length=$((falloc_length - 4096)); \
> done
> fallocate: fallocate failed: No space left on device
> fallocate: fallocate failed: No space left on device
> fallocate: fallocate failed: No space left on device
> fallocate: fallocate failed: No space left on device
> 
>   # Now we have a test_file with an fallocated range of
>   # [2048, 2048 + $falloc_length)
>   # So we should be able to write anywhere in that area without
>   # encountering ENOSPC; but that is what happens when we write
>   # to the last block covered by the range:
> 
> $ dd if=/dev/zero of=test_file bs=1 conv=notrunc \
>     seek=$falloc_length count=2048
> dd: error writing 'test_file': No space left on device
> 1+0 records in
> 0+0 records out
> 0 bytes copied, 0.000164691 s, 0.0 kB/s
> 
> 
> When I apply the diff shown above, I get one more “No space left on
> device” line (indicating that fallocate consistently takes one
> additional block), and then:
> 
> $ uname -sr
> Linux 5.3.0-gf41def397-dirty
> 
> $ dd if=/dev/zero of=test_file bs=1 conv=notrunc \
>     seek=$falloc_length count=2048
> 2048+0 records in
> 2048+0 records out
> 2048 bytes (2.0 kB, 2.0 KiB) copied, 0.0121903 s, 168 kB/s
> 
> (i.e., what I’d expect)