Le Wed, 14 Nov 2018 17:08:59 +0100 Vojtech Pavlik <vojtech@xxxxxxxx> écrivait: > On Wed, Nov 14, 2018 at 04:09:51PM +0100, Emmanuel Florac wrote: > > > am I right to assume that it's safer to create a RAID-1 (for > > instance with md) of two SSDs before setting up bcache as a > > writeback device? > > > > Any hint welcome :) > > That's a big question. > > In my opinion the safest way is to set the target for dirty data to > zero (and reasonable writeback rate parameters). > > In that case, anytime the system is idle, everything gets written to > the backing device and a loss of the caching device is not a problem. > Sure but caching devices may fail sometimes... > Now why I'm not saying outright "Yes, do a RAID-1": > > Bcache relies on barriers for cache device consistency. > > Without a battery-backed RAID controller, it's hard to ensure RAID > consistency in the event of a power loss, since barriers passed down > the device queues are not synchronized. > > So in the naïve approach of just sending the same stream of requests > to each drive, one queue can progress further than the other in the > event of a power loss, resulting in an inconsistent content, > resulting in subsequent data corruption when reads are interleaved. Not a problem here: these are high end SN200 NVMe drives, with capacitor-protected cache, caching a BBU-equipped RAID controller. Belt and suspenders at the same time :) > > A battery backed memory on a RAID controller card can fix that, > keeping all written data until confirmed by the drives. But then - > the memory is again a single point of failure. Good thing that > memories tend to be quite reliable. > Yup, never ever had any single BBU fail. And I've setup a whole lot of them... > Without a battery-backed RAM using a software RAID (md), Linux simply > resorts to only sending one barrier operation at a time to the > underlying drives plus additional housekeeping like a write intent > bitmap. > > This fixes the consistency problem, however comes at a significant > performance cost. Again, something you don't want with a cache device. > So you're basically telling me that using md RAID 1 could possibly be much slower than a Broadcom 94xx RAID controller for NVMe drives? Interesting. > So, make your own decision based on your usecase. :) Use case is : as fast as possible, but doesn't endanger the 600 TB of data, because duh, 600 TB is quite a lot :) -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@xxxxxxxxxxxxxx> | +33 1 78 94 84 02 ------------------------------------------------------------------------
Attachment:
pgpIfZVDdotz4.pgp
Description: OpenPGP digital signature