All kinds of things on RAID/mdadm

Martijn <mailinglist@xxxxxxxxxxxxxx> · Mon, 07 Nov 2011 13:46:56 +0100

Dear readers, perhaps Neil and/or fellow mdadm developers,

Over the last couple of weeks, I've been spending quite some time with 
mdadm and looking at a good way to use RAID on Linux for one of our 
servers. My colleagues (friends) and me are designing a new software 
platform for our company service, and RAID is an important base layer in 
the system.

To give you some context: our company was started not only to provide 
paid services, but also for learning purposes. We where all students 
when we started our company over nine years ago, and we still like to 
learn things related to all kinds of (IT) subjects. In that same 
direction, because we want to learn, we fancy a certain kind of 
'thoroughness' when creating and documenting something. At least, when 
we're given the time to do that, such as in this case.

Our current platform was set up early in 2005, when a colleague and I 
spent an evening finding out if we *really* couldn't just mirror full 
/dev/hda with /dev/hdb ;-) After reading it wasn't possible many, many 
times, we ended up manually copying partition tables from hda to hdb 
(eek!), mirroring partitions instead of drives (huh?), using some LVM 
(every time you think you know how it works, it's different), installing 
grub on both drives ("will this work on failover?"), and.. it worked.

I really slept bad after that though: we needed the reliability cheap, 
but it was so different from what I had imagined upfront and knew from 
hardware RAID. The extra complexity was a big deal for me too. Bigger 
than necessary I think. But I wanted to be more sure I could wrap my 
head around a problem if someone would call me in the middle of the 
night to fix it.

So this time around, I choose to be more thorough on the important 
aspects and one of those aspects is: recovery and what to do if 
something is wrong. While mdadm is a tool that's pretty clear in it's 
usage, supported by a good manual, I've come accross some things I 
cannot document to my full satisfaction after reading the manual. 
raid.wiki.kernel.org is down as well, and ironically the contents aren't 
'mirrored' anywhere. Google Cache may have it, but I can't find it: the 
results are littered with non-important meta pages from the wiki.
I also quickly searched through the mdadm code, but didn't see comments 
that cleared up my questions.

Searching for possible states of an array, I discovered that there are 
all sorts of combinations for states. The basics are clean, degraded and 
dirty. But what does 'clean, no-errors' mean? And 'dirty, no-errors'? 
Searching through the code, I even found a point where a label 'Dirty 
State' could be listed as 'clean'. Is it a good idea to add a list with 
explainations of possible states, basic and exotic, to the manual? Much 
in the same way all monitor events are listed. I can imagine not 
everyone knowing the difference between dirty and degraded for example. 
It's a basic thing that is skipped in most cases.

Perhaps the same could be done for individual disk states. Of course we 
all know "active sync", and based on what I've seen elsewhere the states 
"removed", "spare" and "faulty spare" exist. But having a list of all 
possible states would help prepare documentation for the things we 
really don't want to happen. Takes off the pressure a bit :-)

I'm not voting for mdadm to become a tool that even babies can use to 
create their arrays, but with this info others may be able to act with 
confidence based on their own knowledge, instead of search for articles 
on the web that happen to list the state of the array they're searching 
for. A lot of those articles do not teach anything. They just make you 
brainlessly copy and paste commands and fill in the character device 
files. Some of them are just plain wrong and may result in data being lost.
I also vote on articles giving partitionable devices a good kick over 
using partitions for RAID, but that's outside the scope of this post ;-)

Where do you think that important things, such as to 'how to organize 
failover' and questions like 'do I benifit from putting swap on a RAID 
char. device', should be documented? Is it the currently unreachable 
raid.wiki.kernel.org? Would it be better to provide the info that leads 
to the answers in the mdadm manual so that it is always available?

Are there any sources you would recommend reading if someone is 
interested in how mdadm/software-RAID 'works'? I'm not sure if RAID has 
an actual spec somewhere on which mdadm is based.

Looking forward to your replies and maybe a conversation leading to 
improvement where necessary :-)

Kind regards,
Martijn
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html