Re: bcacheX is missing after removing a backend and adding it again

Coly Li <colyli@xxxxxxx> · Thu, 25 Feb 2021 22:54:02 +0800

On 2/25/21 11:12 AM, wubenqing@xxxxxxxxxxxxx wrote:
> Hi guys,
> I am testing a scenario where multiple backend attach one cache. When I removed one of the backend and added it back, I found that bcacheX was missing. I configured cache_mode to writeback.
> 
> Before:
> /dev/sdd
> └─bcache0
> /dev/sdc
> └─bcache1
> 
> After:
> /dev/sdg
> /dev/sdc
> └─bcache1
> 
> 
> The name of the block device /dev/sdd is changed to /dev/sdg, and bcache0 is missing when excuting lsblk. I found that /sys/bkock/bache0/bcache link to the old device which does not exist.
> # ll /sys/block/bcache0/bcache
> lrwxrwxrwx 1 root root 0 Feb 23 17:36 /sys/block/bcache0/bcache -> ../../../pci0000:80/0000:80:01.0/0000:82:00.0/host1/port-1:3/end_device-1:3/target1:0:3/1:0:3:0/block/sdd/bcache
> 
> 
> The super block of /dev/sdg shows that there is still dirty data stored on the cache device.
> # bcache-super-show /dev/sdg
> sb.magicok
> sb.first_sector8 [match]
> sb.csum 80AE8CFCCC740075 [match]
> sb.version1 [backing device]
> 
> dev.label(empty)
> dev.uuid22cb8e47-67d8-4f54-97b4-a8c86d986aac
> dev.sectors_per_block 1
> dev.sectors_per_bucket 1024
> dev.data.first_sector 16
> dev.data.cache_mode 1 [writeback]
> dev.data.cache_state 2 [dirty]
> 
> 
> When I checked the kernel log, I found that:
> ...
> [81701.447130] bcache: bch_count_backing_io_errors() sdd: IO error on backing device, unrecoverable
> [81701.487543] bcache: bch_count_backing_io_errors() sdd: IO error on backing device, unrecoverable
> [81701.985562] bcache: bch_count_backing_io_errors() sdd: IO error on backing device, unrecoverable
> [81702.590435] bcache: backing_request_endio() Can't flush sdd: returned bi_status 10
> 
> ...
> [81849.890604] bcache: register_bdev() registered backing device sdg
> [81849.890608] bcache: bch_cached_dev_attach() Tried to attach sdg but duplicate UUID already attached
> 
> ...
> 
> "IO error on backing device, unrecoverable" appeared 63 times in total. It may be that the io_disable of the backend device is set to true due to io_error_limit is 64, but I did not find the log "too many IO errors on backing device" which bch_cached_dev_error will print.
> 
> bch_writeback_thread is very high cpu usage and the SSD(cache) shows very high read traffic but no write traffic.
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> nvme0n1           0.00     0.00 33836.00    0.00  1619.07     0.00    98.00    76.55    2.68    2.68    0.00   0.03 100.00
> 
> 
> The important problem is that there is no way to recover bcache0, even if I try to re-execute "echo /dev/sdg > /sys/fs/bcache/register". The kernel log shows that:
> [91091.621773] bcache: register_bcache() error : device already registered
> 
> I suspect that /dev/sdd still remains in c->cached_devs, and it is set to io_disable, and /dev/sdd does not exist anymore, so writeback cannot flush dirty data. Since the name of the block device has become /dev/sdg, /dev/sdg cannot be reattached successfully.
> Does bcache support backend for hot-swapping scenarios? If not, what command should I use to manually restore bcache0.

Which kernel version do you use ?

A reboot might solve the problem. But I feel it could be improved to
avoid the extra reboot.

Let me add it into my todo list, if no one else posts patch before I
work on it...

Thanks for the suggestion.

Coly Li