Re: I/O error on cache device can cause user observable errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> 2024年2月2日 06:25,Arnaldo Montagner <armont@xxxxxxxxxx> 写道:
> 
> The bcache documentation says that errors on the cache device are
> handled transparently.
> 
> I'm seeing a case where the cache device is unregistered in response
> to repeated write errors (expected), but that results in a read error
> on the bcache device (unexpected).
> 
> Here's how I'm reproducing the problem:
> 1. Create a device with dm-error to simulate I/O errors. The device is
> 1G in size and it will fail I/Os in a 4M extent starting at offset
> 128M:
>    $ dmsetup create cache_disk << EOF
>      0      262144    linear /dev/sdb 0
>      262144 8192      error
>      270336 1826816   linear /dev/sdb 270336
>    EOF
> 
> 2. Set up bcache in writethrough mode. The backing device is 1000G in length:
>    $ make-bcache --cache /dev/mapper/cache_disk --bdev /dev/sdc
> --wipe-bcache --bucket 256k
>    $ echo writethrough > /sys/block/bcache0/bcache/cache_mode
>    $ echo 0 > /sys/block/bcache0/bcache/cache/synchronous
> 
>    $ lsblk
>    NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>    ...
>    sdb            8:16   0    10G  0 disk
>    └─cache_disk 253:0    0     1G  0 dm
>      └─bcache0  252:0    0  1000G  0 disk
>    sdc            8:32   0  1000G  0 disk
>    └─bcache0    252:0    0  1000G  0 disk
> 
> 3. Start a random read workload on the bcache device (using fio):
>    $ fio --name=basic --filename=/dev/bcache0 --size=1000G
> --rw=randread  --blocksize=256k --blockalign=256k
> 
> 4. After a while I see that the cache device gets unregistered.
> However, the application output indicates it saw an I/O error on a
> read request:
>     fio: io_u error on file /dev/bcache0: Input/output error: read
> offset=592264298496, buflen=262144
> 
> I can see in the syslogs that bcache unregistered the cache. The logs
> also show that there was an I/O error on the bcache device:
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.176867] bcache:
> bch_count_io_errors() dm-0: IO error on writing data to cache.
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.186494] bcache:
> bch_count_io_errors() dm-0: IO error on writing data to cache.
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.195743] bcache:
> bch_count_io_errors() dm-0: IO error on writing data to cache.
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.204869] bcache:
> bch_count_io_errors() dm-0: IO error on writing data to cache.
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.234722] bcache:
> bch_count_io_errors() dm-0: IO error on writing data to cache.
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.246102] bcache:
> bch_count_io_errors() dm-0: IO error on writing data to cache.
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.274013] bcache:
> bch_count_io_errors() dm-0: IO error on writing data to cache.
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.289128] bcache:
> bch_cache_set_error() error on 427201f5-5c86-4890-9866-f9860e518041:
> dm-0: too many IO errors writing data to cache
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.289128] ,
> disabling caching
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.306212] bcache:
> conditional_stop_bcache_device() stop_when_cache_set_failed of bcache0
> is "auto" and cache is clean, keep it alive.
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.306543] Buffer
> I/O error on dev bcache0, logical block 144595776, async page read
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.316119] bcache:
> cached_dev_detach_finish() Caching disabled for sdc
>    Feb  1 19:47:23 armont-bcache-test kernel: [ 3327.316398] bcache:
> cache_set_free() Cache set 427201f5-5c86-4890-9866-f9860e518041
> unregistered
> 
> The steps above reproduce the problem most of the time, but not
> always. In a few of the attempts, the cache was unregistered without
> resulting in observable I/O errors.
> 
> Is this expected?

Yes, this is expected as device failure or hot-plug handling.

BTW, which part of document do you read that “that errors on the cache device are
handled transparently.”, let me see whether it should be updated.

Thanks.

Coly Li







[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux