On 2/7/22 4:10 PM, Kai Krakow wrote:
Am Mo., 7. Feb. 2022 um 08:37 Uhr schrieb Coly Li <colyli@xxxxxxx>:
For the problem reported by Kai in this thread, the dmesg
[ 27.334306] bcache: bch_cache_set_error() error on
04af889c-4ccb-401b-b525-fb9613a81b69: empty set at bucket 1213, block
1, 0 keys, disabling caching
[ 27.334453] bcache: cache_set_free() Cache set
04af889c-4ccb-401b-b525-fb9613a81b69 unregistered
[ 27.334510] bcache: register_cache() error sda3: failed to run cache set
[ 27.334512] bcache: register_bcache() error : failed to register device
tells that the mate data is corrupted which probably by uncompleted meta data write, which some other people and I countered too (some specific bcache block size on specific device). Update to latest stable kernel may solve the issue, but I don't verify whether the regression is fixed or not.
As far as I can tell, the problem hasn't happened again since. I think
I saw the problem in 5.15.2 (the first 5.15.x I tried), and it was
fixed probably by 'bcache: Revert "bcache: use bvec_virt"' in 5.15.3.
I even tried write-back mode again on multiple systems and it is
stable. OTOH, I must say that I only enabled writeback caching after
using btrfs metadata hinting patches which can move metadata to native
SSD devices - so bcache will no longer handle btrfs metadata writes or
reads. Performance-wise, this seems a superior setup, even bcache
seems to struggle with btrfs metadata access patterns. But I doubt it
has anything to do with whether the 5.15.2 problem triggers or
doesn't, just wanted to state that for completeness.
Copied. Thank you for the information. And by your information, I am
triggered by this report to find hardware to debug another existing
issue for years. This is a powerful motivation from community :-)
Coly Li