Re: file write that exceeds thin device capacity

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 14 Nov 2018 09:10:04 +1100

On Tue, Nov 13, 2018 at 02:57:18PM -0500, Todd Gill wrote:
> Hi,
> 
> This script creates a 1 TB thin device (device mapper)  backed by 1 GB
> of physical space.  The script then writes more than 1 GB via
> $BLOCK_SIZE files to XFS.  I'm testing to see if recovery can be
> automated.
> 
> https://paste.fedoraproject.org/paste/ropelNyOQWCjk3hfK0jltA
> 
> When the $BLOCK_SIZE passed to dd is 4k - dd gets an error on the file
> write that exceeds the physical capacity that backs the thin device.
> XFS doesn't indicate any problems.

user data write error.

> If I set the $BLOCK_SIZE to 32k - I see entries in the system log that
> indicate XFS loops retrying the writes.
> 
> Is that expected?  Is it just more likely to happen with larger block
> sizes?
> 
> I’m looking to understand how to recover when a thin device runs out of
> space under XFS.
> 
> Example system log entries:
> 
> [  +5.048997] XFS (dm-3): metadata I/O error: block 0xf0000
> ("xfs_buf_iodone_callback_error") error 28 numblks 32
> [  +1.376913] XFS: Failing async write: 1164 callbacks suppressed
> [  +0.000004] XFS (dm-3): Failing async write on buffer block 0xf0020.
> Retrying async write.

Filesystem Metadata write error. XFS is configured to retry them by
default. Failing this write will shut down the filesystem as it is a
corruption vector.

If you expand your thin device at this point, the
write will then succeed and the filesystem will continue to operate
normally.

If you configure your filesystem (through
/sys/fs/xfs/<dev>/error/...) to fail metadata writes on ENOSPC
errors, then it will shutdown the filesystem rather than wait for
the thinp device to be expanded.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx