Re: mirrored bcache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le Wed, 14 Nov 2018 17:08:59 +0100
Vojtech Pavlik <vojtech@xxxxxxxx> écrivait:

> On Wed, Nov 14, 2018 at 04:09:51PM +0100, Emmanuel Florac wrote:
> 
> > am I right to assume that it's safer to create a RAID-1 (for
> > instance with md) of two SSDs before setting up bcache as a
> > writeback device?
> > 
> > Any hint welcome :)  
> 
> That's a big question.
> 
> In my opinion the safest way is to set the target for dirty data to
> zero (and reasonable writeback rate parameters). 
> 
> In that case, anytime the system is idle, everything gets written to
> the backing device and a loss of the caching device is not a problem.
> 

Sure but caching devices may fail sometimes...

> Now why I'm not saying outright "Yes, do a RAID-1":
> 
> Bcache relies on barriers for cache device consistency.
> 
> Without a battery-backed RAID controller, it's hard to ensure RAID
> consistency in the event of a power loss, since barriers passed down
> the device queues are not synchronized.
> 
> So in the naïve approach of just sending the same stream of requests
> to each drive, one queue can progress further than the other in the
> event of a power loss, resulting in an inconsistent content,
> resulting in subsequent data corruption when reads are interleaved.

Not a problem here: these are high end SN200 NVMe drives, with
capacitor-protected cache, caching a BBU-equipped RAID controller. Belt
and suspenders at the same time :)


> 
> A battery backed memory on a RAID controller card can fix that,
> keeping all written data until confirmed by the drives. But then -
> the memory is again a single point of failure. Good thing that
> memories tend to be quite reliable.
> 

Yup, never ever had any single BBU fail. And I've setup a whole lot of
them...

> Without a battery-backed RAM using a software RAID (md), Linux simply
> resorts to only sending one barrier operation at a time to the
> underlying drives plus additional housekeeping like a write intent
> bitmap.
> 
> This fixes the consistency problem, however comes at a significant
> performance cost. Again, something you don't want with a cache device.
> 

So you're basically telling me that using md RAID 1 could possibly be
much slower than a Broadcom 94xx RAID controller for NVMe drives?
Interesting.

> So, make your own decision based on your usecase. :)

Use case is : as fast as possible, but doesn't endanger the 600 TB of
data, because duh, 600 TB is quite a lot :)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@xxxxxxxxxxxxxx>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

Attachment: pgpIfZVDdotz4.pgp
Description: OpenPGP digital signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux