Re: KISS (was disappearing luks header and other mysteries)

Arno Wagner <arno@xxxxxxxxxxx> · Mon, 22 Sep 2014 11:14:33 +0200

On Sun, Sep 21, 2014 at 16:29:28 CEST, Marc Ballarin wrote:
> Am 21.09.2014 um 11:58 schrieb Arno Wagner:
> > On Sat, Sep 20, 2014 at 02:29:43 CEST, Sven Eschenberg wrote:
> >> Well, it is not THAT easy.
> > Actially it is.
> >  
> >> If you want resilience/availability, you'll need RAID. Now what do you put
> >> ontop of the RAID when you need to slice it? 
> > And there the desaster starts: Don't slice RAID. It isnot a good 
> > idea.
> >
> >
> >> Put a disklabel/partition on
> >> top of it and stick with a static setup or use LVM which can span multiple
> >> RAIDs (and types) supports snapshotting etc. . Depending on your needs and
> >> usage you will end up with LVM in the end. If you want encryption, you'll
> >> need a crypto layer (or you put it in the FS alongside volume slicing).
> >> Partitions underaneath the RAID, not necessary if the RAID implementation
> >> can subslice physical devices and arrange for different levels on the same
> >> disk. Except unfortunately, when you need a bootloader.
> >>
> >> I don't see any alternative which would be KISS enough, except merging the
> >> layers to avoid collissions due to stacking order etc. . Simple usage and
> >> debugging for the user, but the actual single merged layer would be
> >> anything but KISS.
> > You miss one thing: LVM breaks layereing and rather badly so. That
> > is a deadly sin. Partitioning should only ever been done on
> > monolithic devices. There is a good reason for that, namely that
> > parition-raid, filesystems and LUKS all respect partitioning per
> > default, and hence it actually takes work to break the container 
> > structure.
> 
> Hi,
> 
> I don't see how LVM breaks layering. 

Seriously? LVM allows you to place partitions into partitions.
If yod do not see how that breaks layering, I don't know how
to explain it.

> In theory it replaces partitioning,
> but in practice it is still a very good idea to use one single partition
> per visible disk as a (more or less) universally accepted way to say
> "there is something here, stay away!". The same applies to LUKS or plain
> fiilesystems. No reason to put them on whole disks.
> The megabyte or so that you sacrifice for the partition table (plus
> alignment) is well spent. Partitions do not cause any further overhead,
> as unlike device mapper, they do not add a layer to the storage stack
> (from a users POV they do, but not from the kernel's).
> 
> Note that there is little reason to use mdraid for data volumes nowadays
> (that includes "/" when using a proper initramfs).

There are a lot ov veru good reasons. Simplicity, reliability,
stability, clarity, etc.

>  LVM can handle this
> just fine and unlike mdadm has not seen any major metadata changes, or
> even metadata location changes, in the last years. 

I agree that metadata formats 1.0, 1.1 and 1.2 dor mdraid are
screwed up and the designers have failed. Format 0.90 is entirely
fine though, if unsuitable for very large installations.

> But I'm not sure, it
> can offer redundancy on boot devices. In theory it should, if the boot
> loader knows how to handle it, but I have never tested it. This is
> basically the "merging of layers" that Sven talked about.
> Btrfs and ZFS push this even further, and while they are complex beasts,
> they actually eliminate a lot of complexity for applications and users.

They bring in "magic". That is fine if the user is clueless, like
the typical windows user for example. It is a catastrophe once
things break and a major annoyance for non-clueless users. There
are good resaons this functionality is kept in seperate layers.
We will see whether these things manage to actually pull it off
or not, but I am somewhat doubtful fot YFS and highly doubtfult
for BTRFS. It stinks of the "second system" effect, where designers
that think they have mastered the problem after their first system
throw in everything and the kitchen sink. Usually complex monsters
like that never manage to get good stability dues to complexity. 

> Just look at how simple, generic and cheap it becomes to create a
> consistent backup by using temporary snapshots, or to preserve old
> versions by using long lived snapshots. 

Sorry, but that is one of Linuses messes: "dump" works fine for
that on basically any Unix and it should do so on Linux, but there
are statements by Linus where he admits to breaking the FS layer
and the possibility to damage even an read-only filesystem with
"dump". I have used dump on Linux for snapshots for about 5 years 
way back without problemns though. These people are reinventing 
the wheel and what they produce is not really better than what 
already existed. And if you really need a "hard" snapshot, just 
use the dmraid layer for that.

> This can replace application
> specific backup solutions, that cost an insane amount of money and whose
> user interfaces are based on the principles of Discordianism (so that
> training becomes mandatory).

No. It cannot. An appplication-specific backup solution needs to 
understand the application. Either you never needed it in the
first place, or you still need it when you have snapshots. Just
freezing an image in time is not a valid way to back-up in
many application-specific scenarios.

> Also: Stay away from tools like gparted or parted. 

Not at all. Unlike the infamous "Partition Magic", (g)parted is
reliable. Of course, if you have an LVM-mess, you may break things
because you do not understand the on-disk structure anymore. 
I have used gparted for years regularly and it never broke one
single thing and never behaved in any surprising fashion.
I don't know where you get this nonsense.

> Resizing and, above
> all, moving volumes is bound to cause problems. For example, looking at
> John Wells issue from august 18th (especially mail
> CADt3ZtscbX-rmMt++aXme9Oiu3sxiBW_MD_CGJM_b=t+iMaerQ), the most likely
> culprit really wasn't LVM, but parted. It seems to have set up scratch
> space where it should not have.

I very much doubt that. parted does not create anything you do
not tell it to. Much more likely, LVM caused the user to not
understand what he was doing, which is the whole point why
I do not like it.

> Once resizing or volume deletions/additions are necessary, LVM is
> actually the much simpler and more robust solution. Resizing as well as
> deletions and additions in LVM are well defined, robust and even
> undoable (as long as the filesystem was not adjusted/created). At work,
> we use that on 10,000s of systems.

Well, once you have a _tested_ operation sequence, LVM gives 
you sort-of storage abstraction, and when you automatize 
things, that is very much worthwhile doing. But the situation is
not what you have when working manually on a single system. 
These two are not comparable at all. For example, when doing 
automatization, you make sure to create your change runbook as 
simple as possible and you will test it. You will not experiment 
on production systems. In essence, you add a whole reliability 
layer manually. That is not the situation you have when working 
manually on one system.

> Lastly, it should be noted, that complex storage stacks like
> MD-RAID->LVM->LUKS->(older)XFS can have reliability issues due to stack
> exhaustion (you can make it even worse by adding iSCSI, virtio,
> multi-path and many other things to your storage stack). When and if
> problems occur, depends strongly on the architecture, low-level drivers
> involved  and the kernel version, but it is likely to happen at some
> point. Kernel 3.15 defused this, by doubling the stack size on x86_64.
> (btw: That, and not bad memory, might actually be the most common cause
> behind FAQ item 4.3).

What is your point? Older XFS was unusable with mdraid, beacause
you could have it need weeks (estimated) when you had a RAID
resync and an XFS check at the same time. But what do you need
XFS and LVM in that stack for? Make that MD-RAID->LUKS->ext2/3
and you get a reliable and stable solution. 

Arno
-- 
Arno Wagner,     Dr. sc. techn., Dipl. Inform.,    Email: arno@xxxxxxxxxxx
GnuPG: ID: CB5D9718  FP: 12D6 C03B 1B30 33BB 13CF  B774 E35C 5FA1 CB5D 9718
----
A good decision is based on knowledge and not on numbers. -- Plato

If it's in the news, don't worry about it.  The very definition of 
"news" is "something that hardly ever happens." -- Bruce Schneier
_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
http://www.saout.de/mailman/listinfo/dm-crypt