I think I have solved it after reading up on https://www.kernel.org/doc/html/latest/admin-guide/bcache.html. 1. I've set caching to none. 2. I've detached the caching device 3. I've unregistered it 4. I've done wipe-fs 5. I've recreated bcache caching device (also used --cset-uuid to already put it into the write bcache set) 6. I've registered and reattached the cache to the backing device 7. Now my backing device shows the status as clean again. 8. I've enabled writearound caching for now (will enable writeback if all goes well) It seems the cache is working again: ❯ bcache-status --- bcache --- Device /dev/sda (8:0) UUID c9cd8259-3cee-42ff-a8ec-e11193c09b7e Block Size 0.50KiB Bucket Size 512.00KiB Congested? False Read Congestion 2.0ms Write Congestion 20.0ms Total Cache Size 173.97GiB Total Cache Used 1.74GiB (1%) Total Cache Unused 172.23GiB (99%) Dirty Data 0.50KiB (0%) Evictable Cache 173.97GiB (100%) Replacement Policy [lru] fifo random Cache Mode [writethrough] writeback writearound none Total Hits 2 (0%) Total Misses 1506 Total Bypass Hits 0 (0%) Total Bypass Misses 9138 Total Bypassed 183.70MiB ### Part 2: It's not over! Soon after I've done this and all seemed to be well, bcache has imploded once again, this time thankfully not taking down my root filesystem. Probably because it was not in writeback mode. My OS didn't boot, and I got another checksum error at some bucket and "disabled caching" message. I suspect it as due to my mistake - I have deleted and recreated bcache cache without rebooting in the middle maybe something went wrong because of that. I've rebooted into a live system, deleted the cache again (my backing device was clean). I've written all zeros to the partition before recreating the cache this time though. I suspect maybe bcache found old data there and got confused? Wipefs only deletes superblocks. Before doing anything though I've mounted my backing filesystem with `mount -o 8192` and backed it up using a btrfs-clone Python script. After I've verified my backup was working I've unmounted the backup medium and proceeded to recreate the cache and reattach it. I've also found that `running` was `0` fro my bcache set, so I have turned it on. After a reboot everything was back to normal. I *hope* this will keep working. Last time Bcache broke and took my filesystem with it without anything significant happening. I'd love to know if it's considered stable or what could be causing spontaneous failues. - unfa wt., 23 lis 2021 o 15:48 Tobiasz Karoń <unfa00@xxxxxxxxx> napisał(a): > > Hi! > > TL;DR > > My cache is inconsistent, and that's probably preventing Bcache for m > using it (all I/O goes to the backing device). How can I clear that? > > Details: > > I've been using Bcache for the past few months on my root Btrfs > filesystem with success. > Then one day out of the blue Bcache failed and took my Btrfs > filesystem with it (details: > https://www.youtube.com/watch?v=Hf3zr6CxvmI, looks similar to this: > https://stackoverflow.com/questions/22820492/how-to-revert-bcache-device-to-regular-device). > That's not the topic of my message though. > I've done a clean Arch Linux installation on Bcache + Btrfs once again > using an SSD partition for cache and an HDD as the backing device. > > However, this time it doesn't do anything... > I was unable to find any information online to solve this. > > My Bcache device works fine, the system boots off of it. However all > I/O goes straight to the backing HDD, and the SSD is unused. Needless > to say this means the performance is not what I got used to when > Bcache was working fine. > > Here's what a 3rd party bcache-status script says (it'd be great if > bcache-tools would provide something like this, BTW): > > ❯ bcache-status > --- bcache --- > Device ? (?) > UUID c9cd8259-3cee-42ff-a8ec-e11193c09b7e > Block Size 0.50KiB > Bucket Size 512.00KiB > Congested? False > Read Congestion 2.0ms > Write Congestion 20.0ms > Total Cache Size 173.97GiB > Total Cache Used 8.70GiB (5%) > Total Cache Unused 165.27GiB (95%) > Dirty Data 0.50KiB (0%) > Evictable Cache 173.97GiB (100%) > Replacement Policy [lru] fifo random > Cache Mode (Unknown) > Total Hits 0 > Total Misses 0 > Total Bypass Hits 0 > Total Bypass Misses 0 > Total Bypassed 0B > > The Total Cache Used value has not changed since I've done my initial > Arch Linux installation. It seems that Bcache has "turned off" by that > point. > > Here's the bcache supers fro the backing device and cache > > ❯ bcache-super-show /dev/sda > sb.magic ok > sb.first_sector 8 [match] > sb.csum 4E6EACCA74AB0AE5 [match] > sb.version 1 [backing device] > > dev.label unfa-desktop%20root > dev.uuid 49202fdf-fbe5-48fd-bdd8-df5414da817c > dev.sectors_per_block 8 > dev.sectors_per_bucket 1024 > dev.data.first_sector 16 > dev.data.cache_mode 0 [writethrough] > dev.data.cache_state 3 [inconsistent] > > cset.uuid 9572380e-8e6f-4ce4-8323-80b98a85eeed > > ❯ bcache-super-show /dev/sdd3 > sb.magic ok > sb.first_sector 8 [match] > sb.csum 259C90FD74B4D4BE [match] > sb.version 3 [cache device] > > dev.label (empty) > dev.uuid 95c6449a-03b5-40f2-a8cc-80b1b61c5ef0 > dev.sectors_per_block 1 > dev.sectors_per_bucket 1024 > dev.cache.first_sector 1024 > dev.cache.cache_sectors 364833792 > dev.cache.total_sectors 364834816 > dev.cache.ordered yes > dev.cache.discard no > dev.cache.pos 0 > dev.cache.replacement 0 [lru] > > cset.uuid c9cd8259-3cee-42ff-a8ec-e11193c09b7e > > BTW - I've now realized I've set a label for the backing device but > not the cache. maybe this is the reason? I don't think it should work > this way but I've cleared the label on my backing device just to be > sure. > > Hmm. The cache in inconsistent. I had this before I reinstalled my OS. > I have recreated the bcache cache on the SSD and was hoping that will > solve it. > I don't know what I should do with this, is this the reason why it's > not working? > > I was wondering if washing the partition and recreating the cache > would help, but I don't want to needlessly wear down the SSD if that > won't help. > > Needless to say I would really like to avoid data loss when using > Bcache - it's awesome, and the developer says it's perfectly stable > and safe, but I've had a sudden failure and others had such as well > (without seeing any hardware issues that could be causing that). Maybe > I should quit using Bcache all together? Maybe it's not > production-ready? I was wondering about maybe using Bcachefs, though > the need to compile a custom kernel for it is quite a deterrent. I > tried it briefly, but the bcachefs-tools stopped working at some point > without a visible reason. I know Btrfs is flawed, though it seems to > be the best so far. > > Thank you for your work, > - unfa > > -- > - Tobiasz 'unfa' Karoń > > www.youtube.com/unfa000 -- - Tobiasz 'unfa' Karoń www.youtube.com/unfa000