Re: LUKS safety on RAID 1 mirror

"Sven Eschenberg" <sven@xxxxxxxxxxxxxxxxxxxxx> · Wed, 26 Nov 2014 00:26:26 +0100

On Tue, November 25, 2014 17:17, Arno Wagner wrote:
> On Tue, Nov 25, 2014 at 16:27:07 CET, Sven Eschenberg wrote:
>> I think Mark was aiming at some other concerns with his question.
>>
>> As you stated, backups are mandatory and RAID's purpose is extended
>> availability (and speed).
>>
>> Regarding the concerns of the OP:
>> When a device fails and gets marked as failed there's no difference to
>> single drive operation. With TLER drives the drive will probably not get
>> marked faulty and the broken sector can be rewritten with the data of
>> the
>> other leg, if that's implemented apropriately.
>> What is problematic in a RAID is failure and unreported errors during
>> read(). Say a sector including the LUKS header is instable, gets read
>> and
>> the retrieved data is faulty then broken data might get written to the
>> mirror during manipulation operations including a following write. (Can
>> be
>> compensated by backups though)
>
> That should noty happen unless there is a severe error in the driver,
> or unless the disk itself did not notice. For spinning disk, something
> like this will only happen if the memory or controller of the disk
> is going bad in a sublte way (very, very rare). SSDs are still much
> worse in that regard, due to less firmware maturity. Still, even
> here it is not much of a concern. AFAIK, md-raid does not refresh
> sectors from other disks, it simply reads them from there and
> you should get error reports both from SMART monitoring and from the
> disk driver (in thesystem log), as you have an "uncorrectable sector"
> and you should get an error from RAID monitoring as well.

If all goes well, yes, your are right. Unfortunately some vendors had
problems with firmware (spinning disk), while the SMART-MZR stayed below
threshold, suddendly the drive returns zero-bursts and acks the read as
error free - that really is a desaster, when it happens (funny thing,
after some time, when the MZR drops again and you read the same area again
all goes well). As I don't know the intrinsics of the firmware I can only
suspect that some check did not properly trigger an error return code.

Then there's of course still bus and transport protocol, some time ago
full ECC or back to back error detection weren't really common, nowadays
the transport should be protected though.

>
> So, unless I am mistaken about md-raid, this is NOT a concern,
> unless you run disks and RAID without monitoring, in which case
> you will always be surprised by disk failures even with RAID.

Monitoring is madatory in such a setup, indeed. I guess as things keep
maturing we'll hopefully see such cases less often (broken FW etc.).

>
> As the probablility of the LUKS header being hit is small
> (with a 2TB disk, e.g., the probability of a random bad sector
> being in the LUKS header is 0.0001%, small enough to be irrelevant),
> this is not even a real concern without monitoring unless you
> really are oblivous about your disk dying or corrupting data.
>
> This is a disk reliability problem, do not solve it on LUKS layer,
> but on raw-disk and RAID layer.

I agree, of course it is a reliability problem. I wonder if frequent reads
on the same area (i.e. header) have an impact on sustainability of media.
>From a manufacturer's point of view certainly not ;-). For SSDs this
becomes an interesting question, if I am not mistaken the durability of
data retention decreases slowly when rewriting cells over and over again.

>
>> With two disks the probability of such a specific error increases, on
>> the
>> other hand a RAID1 implementation *should* level reads which in turn
>> decreases the prob. to hit such a specific read error.
>
> No. Not unless the RAID layer is too smart for its own good.
> AFAIK, RAID implementations that do refresh data have extra
> checksums to be really, really sure they do not silently
> corrypt data. For a while (long ago), you could buy enterprise
> disks with 540 Byte sectors for that purpose.

Those enterprise drives indeed offered extended features to RAID
controllers as the controller had it's own manageable space per sector. As
for md-raid, it does not have scratch space for that. MD-RAID has only
scratch space to document broken sectors if I remember correctly (and for
a write intent bitmap).
>
>> The question that remains is: How probable is an unnoted (or unreported)
>> read error and how does the RAID implementation handle specific error
>> scenarios? (Unfortunately there's firmware bugs ...)
>>
>> Say the mirrors are incosistent due to an unnoted read error, the RAID
>> layer can not decide which of the two legs has faulty data.
>
> It basically always can, as the disks know. Unrecognized disk read errors
> are so unlikely as to be irrelevant. Of course bad RAM or CPU (on the
> disk or in your computer) can cause all sorts of havoc, but again, that
> is not something you can solve on the LUKS layer.

Agreed.

>
>> It can whatsoever reread both legs in hope the faulty read is corrected
>> on
>> reread and rewrite afterwards.  I fear such actions are only taken
>> during
>> a forced rebuild though.
>
> Unless you have TLER disks, it is always taken, as the disk
> itself does it.

And hopefully the drive bails out with an error, if it fails to remap.
Imho TLER can be of advantage if the RAID logic takes it into account.

>
>
> Really, if you are concerned about disks dying (and you should be),
> do the sane thing and have SMART monitoring and regular long SMART
> selftests (I do them every 14 days) via smartmond. If you are concerned
> about the RAID being inconsistent (and you should be), do RAID
> consistency checks (I do them every 7 days) and RAID monitoring via
> mdadm.

I agree, if I may ask, do you run the consistency check via cron or some
other way?

>
> Do not mess with things on the crypto-layer that do not belong there.
> That can only make things worse.
>
> Gr"usse,
> Arno
>

Regards

-Sven

>
>> Reagrds
>>
>> -Sven
>>
>> On Tue, November 25, 2014 15:24, Arno Wagner wrote:
>> > On Tue, Nov 25, 2014 at 11:28:47 CET, Fabrice Bongartz wrote:
>> >> Hi Mark,
>> >>
>> >> I currently employ the following setup:
>> >> I have multiple md software raid 1 arrays and luks on top of that.
>> For
>> >> example, /dev/sda1 and /dev/sdb1 are two identifcal disks which are
>> in a
>> >> raid1 using md raid as /dev/md0. The luks encrypted device is
>> /dev/md0.
>> >> So far, I have had two discs fail in two different arrays and I have
>> had
>> >> no problem restoring them. The array continued in degrated mode and I
>> >> could safely replace the two drives and add the new disks to the
>> arrays
>> >> using the mdadm command.
>> >>
>> >> I am also curious as to what the devs have to say about this.
>> >
>> > RAID and LUKS are in separate layers and do not influence
>> > each other. See FAQ Items 2.2 ad 2.8. 2.8 also has a picture.
>> >
>> > If you place LUKS atop RAID, you get pretty much
>> > the same change as with a normal filesystem atop RAID. Of
>> > course, the LUKS header is critical, which is why you should
>> > always have a header backup, just the same as without RAID.
>> >
>> > If you place LUKS below RAID (not that good an idea), you
>> > will have to unlock the raw devices before the RAID can
>> > be assembled. You should have header backups for as much
>> > devices as are neded to assemble the RAID, but better for
>> > all.
>> >
>> > Really, these are separate issuses, LUKS and RAID do not
>> > magically interact behind your back.
>> >
>> > Gr"usse,
>> > Arno
>> >
>> >> BTW: I always make a complete backup on a third external disk, I
>> don't
>> >> want to take any chances.
>> >>
>> >> Cheers,
>> >>
>> >> Fabrice Bongartz
>> >>
>> >>
>> >> Von: "Mark Connor" <markc44@xxxxxxx>
>> >> An: "dm-crypt" <dm-crypt@xxxxxxxx>
>> >> Gesendet: Dienstag, 25. November 2014 11:03:17
>> >> Betreff:  LUKS safety on RAID 1 mirror
>> >>
>> >> Hello
>> >>
>> >> I currently have a deployment with luks (aes-cbc-256) on different
>> 1TB,
>> >> 500GB, 300GB etc. drives. All the drives use different keys and XFS
>> >> filesystem on the top of luks.
>> >> I'm planning to replace this setup with 2X4TB disks in software raid1
>> >> (with mdraid) but I have my concerns.
>> >>
>> >> 1, If a sector goes bad on disk1 that normally shouldn't be
>> replicated
>> >> to disk2 but in case of luks I don't know what happens then.
>> >>
>> >> 2, I think it is more practical -when one is dealing with encryption-
>> to
>> >> keep many smaller partitions encrypted with separate keys, in case of
>> >> partial disk failure (other parts of the disk can still be accessed).
>> >> Also all the partitions have their own separate luks headers...
>> >>
>> >> Unlike if I don't even create partition just put sda (4TB) sdb(4TB)
>> into
>> >> and md0 array and make luks on that one, if anything goes wrong with
>> the
>> >> header I lose all my data or if any part of the disks breaks.
>> >>
>> >> I know that ultimately raid is only protect against drive failures
>> (not
>> >> if files get corrupted or deleted) so have to have a separated
>> >> snapshotted backup next to it. But would implementing raid1 in case
>> of
>> >> luks be an advantage or a disadvantage?
>> >>
>> >> Thanks
>> >> _______________________________________________
>> >> dm-crypt mailing list
>> >> dm-crypt@xxxxxxxx
>> >> http://www.saout.de/mailman/listinfo/dm-crypt
>> >
>> >> _______________________________________________
>> >> dm-crypt mailing list
>> >> dm-crypt@xxxxxxxx
>> >> http://www.saout.de/mailman/listinfo/dm-crypt
>> >
>> >
>> > --
>> > Arno Wagner,     Dr. sc. techn., Dipl. Inform.,    Email:
>> arno@xxxxxxxxxxx
>> > GnuPG: ID: CB5D9718  FP: 12D6 C03B 1B30 33BB 13CF  B774 E35C 5FA1 CB5D
>> > 9718
>> > ----
>> > A good decision is based on knowledge and not on numbers. -- Plato
>> >
>> > If it's in the news, don't worry about it.  The very definition of
>> > "news" is "something that hardly ever happens." -- Bruce Schneier
>> > _______________________________________________
>> > dm-crypt mailing list
>> > dm-crypt@xxxxxxxx
>> > http://www.saout.de/mailman/listinfo/dm-crypt
>> >
>>
>>
>> _______________________________________________
>> dm-crypt mailing list
>> dm-crypt@xxxxxxxx
>> http://www.saout.de/mailman/listinfo/dm-crypt
>
> --
> Arno Wagner,     Dr. sc. techn., Dipl. Inform.,    Email: arno@xxxxxxxxxxx
> GnuPG: ID: CB5D9718  FP: 12D6 C03B 1B30 33BB 13CF  B774 E35C 5FA1 CB5D
> 9718
> ----
> A good decision is based on knowledge and not on numbers. -- Plato
>
> If it's in the news, don't worry about it.  The very definition of
> "news" is "something that hardly ever happens." -- Bruce Schneier
> _______________________________________________
> dm-crypt mailing list
> dm-crypt@xxxxxxxx
> http://www.saout.de/mailman/listinfo/dm-crypt
>

_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
http://www.saout.de/mailman/listinfo/dm-crypt