Hi, I'm using bcache in a Linux 5.4.69 kernel, and I'm testing transient cache device failures with a backing backing device using 'writeback' mode, and with several gigabytes of dirty data (that has not reached the backing device). In my first test, the cache devices are using the default "unregister" value for the "errors" sysfs attribute knob (for bcache cache devices in /sys/fs/bcache/...). When I induce a cache device failure, bcache backing devices stop, the cache device is detached from all affected backing devices, and I/O errors are returned on subsequent access attempts to the backing devices. This all works as I think it would based on how it's configured. The downside to "unregister" is when I reboot the system (with the cache block device reinstated/working), the backing devices come up but with no cache device attached! So this certainly causes file system corruption as dirty data is not present on the backing device (since the backing device is started without the cache device). On the second test run, I used "panic" for the "unregister" sysfs value, and this works cleaner, most of the time. When I induce a cache block device failure, the system then panics, but the cache device stays associated with the backing devices -- and dirty data can then flush to the backing device. On this second test, when the system booted back up, one cache device failed to start: ... [ 333.116149] bcache: prio_read() bad csum reading priorities [ 333.116151] bcache: prio_read() bad magic reading priorities [ 333.116636] bcache: bch_cache_set_error() bcache: error on 2f255344-bb44-44b9-930d-90f23b384e9c: [ 333.116637] corrupted btree at bucket 473, block 44, 504 keys [ 333.116638] bcache: bch_cache_set_error() , disabling caching [ 333.116638] [ 333.116649] bcache: register_cache() error dm-12: failed to run cache set [ 333.116650] bcache: register_bcache() error : failed to register device ... This seemed to be a temporary problem -- I rebooted the system again, and then the bcache cache device started without issue. I did not check for data loss / corruption in this instance. A third test run using "panic" mode resulted in everything coming back up normally, and seemingly operating just fine (no cache/backing device start errors). I did not check for data loss / corruption in this instance either. So, I guess just a couple questions to solidify my expectations on this type of transient cache device failure (cache block device fails, but then can come back later fully intact): - It sounds like for handling this case, "panic" mode for the "errors" sysfs attribute is best since it does not detach the cache device from backing devices - Is this safe/reliable (transient cache device failures)? Obviously it's not preferred, but should I expect any problems should this occur and using "panic" mode? No metadata corruption on the cache device is expected? Thanks for your time. Appreciate the great work on bcache! --Marc