Re: [PATCH] xfs_io: fix exitcode handling (was Re: generic/399 and xfs_io pwrite command)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/06/2017 02:26 AM, Dave Chinner wrote:
On Wed, Dec 06, 2017 at 08:18:50AM +1100, Dave Chinner wrote:
On Tue, Dec 05, 2017 at 04:23:43PM +0200, Ari Sundholm wrote:
On 12/05/2017 12:28 AM, Dave Chinner wrote:
On Thu, Nov 30, 2017 at 04:21:47PM +0200, Ari Sundholm wrote:
Hi!

While debugging an issue, we found out that generic/399 seems to
rely on a behavior that is specific to ext4 in the following section
of code:
----------------8<--------snip--------8<------------------------------
#
# Write files of 1 MB of all the same byte until we hit ENOSPC.
Note that we
# must not create sparse files, since the contents of sparse files are not
# stored on-disk.  Also, we create multiple files rather than one big file
# because we want to test for reuse of per-file keys.
#
total_file_size=0
i=1
while true; do
	file=$SCRATCH_MNT/encrypted_dir/file$i
	if ! $XFS_IO_PROG -f $file -c 'pwrite 0 1M' &> $tmp.out; then
		if ! grep -q 'No space left on device' $tmp.out; then
			echo "FAIL: unexpected pwrite failure"
			cat $tmp.out
		elif [ -e $file ]; then
			total_file_size=$((total_file_size + $(stat -c %s $file)))
		fi
		break
	fi
	total_file_size=$((total_file_size + $(stat -c %s $file)))
	i=$((i + 1))
	if [ $i -gt $fs_size_in_mb ]; then
		echo "FAIL: filesystem never filled up!"
		break
	fi
done
----------------8<--------snip--------8<------------------------------

What happens with ext4 is that the xfs_io command gives a nonzero
exit value not when the pwrite command fails with ENOSPC but during
the *next* iteration when opening the file fails with ENOSPC. Turns
out the pwrite command failing does not cause xfs_io to give a
nonzero exit value.

That implies ext4 is returning zero bytes written to the pwrite()
call rather than ENOSPC. i.e.:

                 bytes = do_pwrite(file->fd, off, cnt, buffersize,
                                 pwritev2_flags);
                 if (bytes == 0)
                         break;
                 if (bytes < 0) {
                         perror("pwrite");
                         return -1;
                 }

So if it's exiting with no error, then we can't have got an error
>from ext4 at ENOSPC. If that's the case, it probably should be
considered an ext4 bug, not an issue with xfs_io...


No, according to what we've observed, that is not what happens. The
pwrite() call does fail and errno is ENOSPC after the call. The
immediate problem is that xfs_io does not reflect this failure in
its exit value and thus the check in generic/399 does not work in
this case. Only when open() fails during the next iteration does
xfs_io give a nonzero exit value and cause the check in the test
case to allow the test case to end successfully.

Ah, io/pwrite.c fails to set exitcode on failure iappropriately
before returning. Looks like there is a bunch of xfs_io commands
that fail to do this properly.

What is specific to ext4 here is, as stated in my original message,
that open() fails.

Because there are no more inodes to allocate (i.e. inode table is
full) and so attempts to create more fail with ENOSPC. The xfs_io
open command set exitcode appropriately, and that's why it finally
triggers a test abort.

Looks like xfs_io does need fixing to set exitcode appropriately on
all failures...

Try the patch below.


Thanks, I'll try it. Sorry for the late response (was on vacation).

Cheers,

Dave.


--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux