Hello! Am Di., 23. Nov. 2021 um 23:34 Uhr schrieb Tobiasz Karoń <unfa00@xxxxxxxxx>: > > Thank you for your detailed reply and sharing your experience and solution. > > So it seems Bcache and Btrfs are fundamentally incompatible when it > comes to caching writes? It has worked fine for 2 months, and then it > just imploded. I'll stay in writearound mode to be safe. No, they are not fundamentally incompatible but losing writeback data on btrfs is much more a visible catastrophic event than to other file systems (which write data in-place when btrfs writes cow). Even with other filesystems and bcache destroying itself in writeback mode would cause severe damage of your filesystem (on classical filesystem, usually you end up with garbled files having partially old and new data, maybe some fixable metadata errors) - BUT: it is still a catastrophic event, maybe even more so because data loss could go silent, ending up in your backups, only to find later that you're missing data that has already been rotated out of the backup. Don't use writeback if you cannot afford to recover from backup when writeback fails. That's a property of how caching works, not a property of btrfs or bcache. It's the same for any writeback cache you might be using: RAID-controllers come with writeback caches, and decide to throw it away sometimes, leaving you with destroyed filesystems, so you usually turn that off unless your workload requires it and you can afford to throw lost data away). That doesn't make them fundamentally incompatible with filesystems, right? Your HDD comes with write caches which may destroy your filesystem, too, on power-loss. You might want to turn that off, especially when using btrfs (but also for better write latency behavior, and the kernel has better IO scheduling anyways than the really small writecaches of HDDs): `hdparm -W0 /dev/HDDDEV`. HDD write caches are only useful for operating systems that do no proper write ordering/merging (usually DOS, and maybe Windows), and sometimes HDD firmwares are buggy and cannot use async queueing, when write caches may improve performance a lot. But usually, you want to keep that setting off. That becomes even more important when you use bcache in writeback mode (because HDD write caching may then break assumptions of bcache). > I've checked and my cache device has a block size of 512 bytes. Yep, all my bcache systems using 512 bytes are affected by that 5.15.2 kernel bug. Use 4k and you should be okay. The problem seems to come from page-unaligned writes - and using 4k (the page size of your CPU) seems to work around that. Kernel 5.15.3 has the most part of the fix, another fix is queued for one of the next releases. Another lesson learned: Don't use a new kernel until it's in its x.y.{4,5,6} releases. This is not the first time I had catastrophic events with kernels in their infancy. That's why I usually avoid .0 and .1 kernels. Seems I should add .2 and .3 kernels to that list, too. Never do a major kernel upgrade without creating a full backup first. Kernel components like bcache are much less well-tested than other components, so they likely break on early kernel releases for some exotic use-cases (exotic because nobody who cares about their data uses writeback). > That's > a strange value, as the backing device is a AF HDD (like all of them > in the past decade or more), so the block size should be 4Kb. > I guess this also works until it doesn't. You won't have catastrophic events with writearound - and that's as good as writeback on btrfs (and even better because it won't destroy the filesystem in case of a cache hiccup). Bcache can break for any reason, due to bugs, like any other kernel component. And bcache in writeback mode usually means catastrophic results for ANY file system attached to it - where btrfs is just much more likely to detect those events. Even if you COULD repair the file system logical structure, it still means some data wasn't written - btrfs just has a much better understanding about what should be on the disk while other filesystems silently accept the data loss after recovering from structural errors. BTW: 4k should be safe, there's another problem in bcache unrelated to this which still needs fixing. > Can I destroy and recreate the cache device on a live system (my root > filesystem is on this bcache set). I guess I can't. Yes, you can. Detaching the cache makes the backing devices pass through, they are still available as /dev/bcache* even with no caching device. > This is probably what I've done wrong today - I did > not unregister the whole cset before attempting to recreate the cache > device. Okay, unregistering should be quite essential but you don't need to reboot. Also, I recommend using a new cset UUID so it cannot conflict with any stale data that MAY be stored in the cache. > I am honestly a little afraid to touch it, after what happened. Well, the cache backend is stopped or detached - it doesn't matter anyways. Just don't use writeback for the next couple of kernel releases (or maybe rather avoid it for the future completely). Writeback really doesn't gain you a lot on btrfs because due to COW, btrfs is already quite good writing (because writes are usually going to be sequential anyways), and it has become a lot better during the last few kernel release cycles. I've been using writeback for a long time now but this is just another occasion why I should not have been using writeback but writearound instead (the other one being that sometimes on boot, my SSD detaches from the bus, making bcache throw away all writeback data and leaving me with a destroyed filesystem). > I hope Bcachefs will eliminate these problems and provide a stable > unified solution. You're swapping one "experimental" FS (btrfs) which has matured great ways during at least the last 5 years with another experimental filesystem which is not yet battle-tested and performance-tuned. bcachefs and bcache are two completely distinctive products with different use-cases, they only share a similar name because the fundamental inner structures are based on the same code and idea (and probably because the author thought it's cool). I'm not sure if you use device pooling with btrfs (multiple disks) but for my system, it showed useful to NOT use RAID-0 for btrfs data, it's actually slower in normal desktop use and the way how btrfs internally distributes data access across devices. I found that using single-data mode even with multidisk has better write behavior and better read latency, and it makes better use of bcache. So maybe its worth a try if you fear that using writearound mode could degrade your system responsiveness too much. > Take care > - unfa Good luck Kai > wt., 23 lis 2021 o 18:40 Kai Krakow <kai@xxxxxxxxxxx> napisał(a): > > > > Oops: > > > > > # echo 1 >/sys/fs/bcache/CSETUUID/unregister > > > # bcache make -C -w 4096 -l LABEL --force /dev/BPART > > > > CPART of course! > > > > # bcache make -C -w 4096 -l LABEL --force /dev/CPART > > > > Bye > > Kai > > > > -- > - Tobiasz 'unfa' Karoń > > www.youtube.com/unfa000