Il 23-05-2017 14:27 Carlos Maiolino ha scritto:
Aha, you are using sync flag, that's why you are getting IO errors
instead of
ENOSPC, I don't remember from the top of my mind why exactly, it's been
a while
since I started to work on this XFS and dm-thin integration, but IIRC,
the
problem is that XFS reserves the data required, and don't expect to get
an
ENOSPC once the device "have space", and when the sync occurs, kaboom.
I should
take a look again on it.
Ok, I tried with a more typical non-sync write and it seems to report
ENOSPC:
[root@blackhole ~]# dd if=/dev/zero of=/mnt/storage/disk.img bs=1M
count=2048
dd: error writing ‘/mnt/storage/disk.img’: No space left on device
2002+0 records in
2001+0 records out
2098917376 bytes (2.1 GB) copied, 7.88216 s, 266 MB/s
With /sys/fs/xfs/dm-6/error/metadata/ENOSPC/max_retries = -1 (default),
I have the following dmesg output:
[root@blackhole ~]# dmesg
[23152.667198] XFS (dm-6): Mounting V5 Filesystem
[23152.762711] XFS (dm-6): Ending clean mount
[23192.704672] device-mapper: thin: 253:4: reached low water mark for
data device: sending event.
[23192.988356] device-mapper: thin: 253:4: switching pool to
out-of-data-space (error IO) mode
[23193.046288] Buffer I/O error on dev dm-6, logical block 385299, lost
async page write
[23193.046299] Buffer I/O error on dev dm-6, logical block 385300, lost
async page write
[23193.046302] Buffer I/O error on dev dm-6, logical block 385301, lost
async page write
[23193.046304] Buffer I/O error on dev dm-6, logical block 385302, lost
async page write
[23193.046307] Buffer I/O error on dev dm-6, logical block 385303, lost
async page write
[23193.046309] Buffer I/O error on dev dm-6, logical block 385304, lost
async page write
[23193.046312] Buffer I/O error on dev dm-6, logical block 385305, lost
async page write
[23193.046314] Buffer I/O error on dev dm-6, logical block 385306, lost
async page write
[23193.046316] Buffer I/O error on dev dm-6, logical block 385307, lost
async page write
[23193.046319] Buffer I/O error on dev dm-6, logical block 385308, lost
async page write
With /sys/fs/xfs/dm-6/error/metadata/ENOSPC/max_retries = 0, dmesg
output is slightly different:
[root@blackhole default]# dmesg
[23557.594502] device-mapper: thin: 253:4: switching pool to
out-of-data-space (error IO) mode
[23557.649772] buffer_io_error: 257430 callbacks suppressed
[23557.649784] Buffer I/O error on dev dm-6, logical block 381193, lost
async page write
[23557.649805] Buffer I/O error on dev dm-6, logical block 381194, lost
async page write
[23557.649811] Buffer I/O error on dev dm-6, logical block 381195, lost
async page write
[23557.649818] Buffer I/O error on dev dm-6, logical block 381196, lost
async page write
[23557.649862] Buffer I/O error on dev dm-6, logical block 381197, lost
async page write
[23557.649871] Buffer I/O error on dev dm-6, logical block 381198, lost
async page write
[23557.649880] Buffer I/O error on dev dm-6, logical block 381199, lost
async page write
[23557.649888] Buffer I/O error on dev dm-6, logical block 381200, lost
async page write
[23557.649897] Buffer I/O error on dev dm-6, logical block 381201, lost
async page write
[23557.649903] Buffer I/O error on dev dm-6, logical block 381202, lost
async page write
Notice the suppressed buffer_io_error entries: are they related to the
bug you linked before?
Anyway, in *no* cases I had a filesystem shutdown on these errors.
Trying to be pragmatic, my main concern is to avoid extended filesystem
and/or data corruption in the case a thin pool become inadvertently
full. For example, with ext4 I can mount the filesystem with
"errors=remount-ro,data=journaled" and *any* filesystem error (due to
thinpool or other problems) will put the filesystem in a read-only
state, avoiding significan damages.
If, and how, I can replicate this behavior with XFS? From my
understanding, XFS does not have a "remount read-only" mode. Moreover,
until its metadata can be safely stored on disk (ie: they hit already
allocated space), it seems to happily continue to run, disregarding data
writeout problem/error. As a note, ext4 without "data=jornaled" bahave
quite similarly, whit a read-only remount happening on metadata errors
only.
Surely I am missing something... right?
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html