On Tue, Nov 13, 2018 at 02:57:18PM -0500, Todd Gill wrote: > Hi, > > This script creates a 1 TB thin device (device mapper) backed by 1 GB > of physical space. The script then writes more than 1 GB via > $BLOCK_SIZE files to XFS. I'm testing to see if recovery can be > automated. > > https://paste.fedoraproject.org/paste/ropelNyOQWCjk3hfK0jltA > > When the $BLOCK_SIZE passed to dd is 4k - dd gets an error on the file > write that exceeds the physical capacity that backs the thin device. > XFS doesn't indicate any problems. user data write error. > If I set the $BLOCK_SIZE to 32k - I see entries in the system log that > indicate XFS loops retrying the writes. > > Is that expected? Is it just more likely to happen with larger block > sizes? > > I’m looking to understand how to recover when a thin device runs out of > space under XFS. > > Example system log entries: > > [ +5.048997] XFS (dm-3): metadata I/O error: block 0xf0000 > ("xfs_buf_iodone_callback_error") error 28 numblks 32 > [ +1.376913] XFS: Failing async write: 1164 callbacks suppressed > [ +0.000004] XFS (dm-3): Failing async write on buffer block 0xf0020. > Retrying async write. Filesystem Metadata write error. XFS is configured to retry them by default. Failing this write will shut down the filesystem as it is a corruption vector. If you expand your thin device at this point, the write will then succeed and the filesystem will continue to operate normally. If you configure your filesystem (through /sys/fs/xfs/<dev>/error/...) to fail metadata writes on ENOSPC errors, then it will shutdown the filesystem rather than wait for the thinp device to be expanded. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx