draft howto on making raids for surviving a disk crash

Keld Jørn Simonsen <keld@xxxxxxxx> · Sat, 2 Feb 2008 20:41:31 +0100

This is intended for the linux raid howto. Please give comments.
It is not fully ready /keld

Howto prepare for a failing disk

The following will describe how to prepare a system to survive
if one disk fails. This can be important for a server which is
intended to always run. The description is mostly aimed at
small servers, but it can also be used for
work stations to protect it for not losing data, and be running even if a 
disk fails. Some recommendations on larger server setup is given
at the end of the howto.

This requires some extra hardware, especially disks, and the description 
will also touch how to mak the most out of the disks, be it in terms of
available disk space, or input/output speed.

1. Creating of partitions

We recommend creating partitions for /boot, root, swap and other file systems.
This can be done by fdisk, parted or maybe a graphical interface
like the Mandriva/PClinuxos harddrake2.  It is recommended to use drives
with equal sizes and performance characteristics.

If we are using the 2 drives sda and sdb, then sfdisk
may be used to make all the partitions into raid partitions:

   sfdisk -c /dev/sda 1 fd
   sfdisk -c /dev/sda 2 fd
   sfdisk -c /dev/sda 3 fd
   sfdisk -c /dev/sda 5 fd
   sfdisk -c /dev/sdb 1 fd
   sfdisk -c /dev/sdb 2 fd
   sfdisk -c /dev/sdb 3 fd
   sfdisk -c /dev/sdb 5 fd

Using:

   fdisk -l /dev/sda /dev/sdb

The partition layout could then look like this:

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1          37      297171   fd  Linux raid autodetect
/dev/sda2              38        1132     8795587+  fd  Linux raid autodetect
/dev/sda3            1133        1619     3911827+  fd  Linux raid autodetect
/dev/sda4            1620      121601   963755415    5  Extended
/dev/sda5            1620      121601   963755383+  fd  Linux raid autodetect

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1          37      297171   fd  Linux raid autodetect
/dev/sdb2              38        1132     8795587+  fd  Linux raid autodetect
/dev/sdb3            1133        1619     3911827+  fd  Linux raid autodetect
/dev/sdb4            1620      121601   963755415    5  Extended
/dev/sdb5            1620      121601   963755383+  fd  Linux raid autodetect

2. Prepare for boot

The system should be set up to boot from multiple devices, so that
if one disk fails, the system can boot from another disk.

On Intel hardware, there are two common boot loaders, grub and lilo.
Both grub and lilo can only boot off a raid1. they cannot boot off
any other software raid device type. The reason they can boot off
the raid1 is that hey see the raid1 as a normal disk, they only then use
one of the dishs when booting. The boot stage only involves loading the kernel
with a initrd image, so not much data is needed for this. The kernel,
the initrd and other boot files can be put in a small /boot partition.
We recommend something like 200 MB on an ext3 raid1.

Make the raid1 and ext3 filesystem:

   mdadm --create /dev/md0 --chunk=256 -R -l 1 -n 2 /dev/sda1 /dev/sdb1
   mkfs -t ext3 -f /dev/md0

Make each of the disks bootable by lilo:

   lilo -b /dev/sda /etc/lilo.conf1
   lilo -b /dev/sdb /etc/lilo.conf2

Make each of the disks bootable by grub

(to be described)

3. The root file system

The root file system can be on another raid tah the /boot partition.
We recommend an raid10,f2, as the root file system will mostly be reads, and
the raid10,f2 raid type is the fastest for reads, while also sufficient 
fast for writes. Other relevant raid types would be raid10,o2 or raid1.

It is recommended to use the udev file system, as this runs in RAM, and you
thus can avoid a number of read and writes to disk.

It is recommended that all file systems are mounted with the noatime option, this 
avoids writing to the filesystem inodes every time a file has been read or written.

Make the raid10,f2 and ext3 filesystem:

   mdadm --create /dev/md1 --chunk=256 -R -l 10 -n 2 -p f2 /dev/sda2 /dev/sdb2
   mkfs -t ext3 -f /dev/md1

4. The swap file system

If a disk fails, where processes are swapped to, then all these processes fail.
This may be vital processes for the system, or vital jobs on the system. You can prevent 
the failing of the processes by having the swap partitions on a raid. The swap area
needed is normally relatively small compared to the overall disk space available,
so we recommend the faster raid types over the more space economic. The raid10,f2
type seems to be the fastest here, other relevant raid types could be raid10,o2 or raid1.

Given that you have created a raid array, you can just make the swap partition directly
on it:

   mdadm --create /dev/md2 --chunk=256 -R -l 10 -n 2 -p f2 /dev/sda3 /dev/sdb3
   sfdisk -c /dev/md 2 82
   mkswap /dev/md2

Maybe something on /var and /tmp could go here.

5. The rest of the file systems.

Other file systems can also be protected against one failing disk.
Which technique to recommend depends on your purpose with the
disk space. You may mix the different raid types if you have different types
of use on the same server, eg a data base and servicing of large files
from the same server. (This is one of the advantages of software raid
over hardware raid: you may have different types of raids on
a disk with a software raid, where a hardware raid only may take one
type for the whole disk.)

Is disk capacity the main priority, and you have more than 2 drives,
then raid5 is recommended. Raid5 only uses 1 drive for securing the
data, while raid1 and raid10 use at least half the capacity.
For example with 4 drives, raid5 provides 75 % of the total disk
space as usable, while raid1 and raid10 at most (dependent on the number
of copies) give a 50 % usability of the disk space. This becomes even better
for raid5 with more disks, with 10 disks you only use 10 % for security.

Is speed your main priority, then raid10,f2   raid10,o2 or raid1 would give you
most speed during normal operation. This even works if you only have 2 drives.

Is speed with a failed disk a concern, then raid10,o2 could be the choice, as
raid10,f2 is somewhat slower in operation, when a disk has failed.

Examples:

   mdadm --create /dev/md3 --chunk=256 -R -l 10 -n 2 -p f2 /dev/sda5 /dev/sdb5
   mdadm --create /dev/md3 --chunk=256 -R -l 10 -n 2 -p o2 /dev/sd[ab]5
   mdadm --create /dev/md3 --chunk=256 -R -l  5 -n 4       /dev/sd[abcd]5

6. /etc/mdadm.conf

Something here on /etc/mdadm.conf. What would be safe, allowing
a system to boot even if a disk has crashed?

7. Recommendation for the setup of larger servers.

Given a larger server setup, with more disks, it is possible to
survive more than one disk crash. The raid6 array type can be used
to be able to survive 2 disk crashes, at the expense of the space of 2 disks.
The /boot, root and swap partitions can be set up with more disks, eg a 
/boot partition made up from a raid1 of 3 disks, and root and swap partitons 
made up from raid10,f3 arrays. Given that raid6 cannot survive more than the chashes
of 2 disks, the system disks need not be prepared for more than 2 craches
either, and you can use the rest of the disk IO capacity to speed up the system.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html