On 11/16/21 6:10 PM, Kai Krakow wrote:
Hello Coly!
I think I can consistently reproduce a failure mode of bcache when
going from 5.10 LTS to 5.15.2 - on one single system (my other systems
do just fine).
In 5.10, bcache is stable, no problems at all. After booting to
5.15.2, btrfs would complain about broken btree generation numbers,
then freeze completely. Going back to 5.10, bcache complains about
being broken and cannot start the cache set.
I was able to reproduce the following behavior after the problem
struck me twice in a row:
1. Boot into SysRescueCD
2. modprobe bcache
3. Manually detach the btrfs disks from bcache, set cache mode to
none, force running
4. Reboot into 5.15.2 (now works)
5. See this error in dmesg:
[ 27.334306] bcache: bch_cache_set_error() error on
04af889c-4ccb-401b-b525-fb9613a81b69: empty set at bucket 1213, block
1, 0 keys, disabling caching
[ 27.334453] bcache: cache_set_free() Cache set
04af889c-4ccb-401b-b525-fb9613a81b69 unregistered
[ 27.334510] bcache: register_cache() error sda3: failed to run cache set
[ 27.334512] bcache: register_bcache() error : failed to register device
6. wipefs the failed bcache cache
7. bcache make -C -w 512 /dev/sda3 -l bcache-cdev0 --force
8. re-attach the btrfs disks in writearound mode
9. btrfs immediately fails, freezing the system (with transactions IDs way off)
10. reboot loops to 5, unable to mount
11. escape the situation by starting at 1, and not make a new bcache
Is this a known error? Why does it only hit this machine?
SSD Model: Samsung SSD 850 EVO 250GB
This is already known, there are 3 locations to fix,
1, Revert commit 2fd3e5efe791946be0957c8e1eed9560b541fe46
2, Revert commit f8b679a070c536600c64a78c83b96aa617f8fa71
3, Do the following change in drivers/md/bcache.c,
@@ -885,9 +885,9 @@ static void bcache_device_free(struct bcache_device *d)
bcache_device_detach(d);
if (disk) {
- blk_cleanup_disk(disk);
ida_simple_remove(&bcache_device_idx,
first_minor_to_idx(disk->first_minor));
+ blk_cleanup_disk(disk);
}
The fix 1) and 3) are on the way to stable kernel IMHO, and fix 2) is only my workaround and I don't see upstream fix yet.
Just FYI.
Coly Li