Re: possibly silly question (raid failover)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/11/2011 01:38, Miles Fidelman wrote:
Hi Folks,

I've been exploring various ways to build a "poor man's high
availability cluster." Currently I'm running two nodes, using raid on
each box, running DRBD across the boxes, and running Xen virtual
machines on top of that.

I now have two brand new servers - for a total of four nodes - each with
four large drives, and four gigE ports.

Between the configuration of the systems, and rack space limitations,
I'm trying to use each server for both storage and processing - and been
looking at various options for building a cluster file system across all
16 drives, that supports VM migration/failover across all for nodes, and
that's resistant to both single-drive failures, and to losing an entire
server (and it's 4 drives), and maybe even losing two servers (8 drives).

The approach that looks most interesting is Sheepdog - but it's both
tied to KVM rather than Xen, and a bit immature.

But it lead me to wonder if something like this might make sense:
- mount each drive using AoE
- run md RAID 10 across all 16 drives one one node
- mount the resulting md device using AoE
- if the node running the md device fails, use pacemaker/crm to
auto-start an md device on another node, re-assemble and republish the
array
- resulting in a 16-drive raid10 array that's accessible from all nodes

Or is this just silly and/or wrongheaded?

Miles Fidelman


One thing to watch out for when making high-availability systems and using RAID1 (or RAID10), is that RAID1 only tolerates a single failure in the worst case. If you have built your disk image spread across different machines with two-copy RAID1, and a server goes down, then the rest then becomes vulnerable to a single disk failure (or a single unrecoverable read error).

It's a different matter if you are building a 4-way mirror from the four servers, of course.

Alternatively, each server could have its four disks set up as a 3+1 local raid5. Then you combine them all from different machines using raid10 (or possibly just raid1 - depending on your usage patterns, that may be faster). That gives you an extra safety margin on disk problems.

But the key issue is to consider what might fail, and what the consequences of that failure are - including the consequences for additional failures.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux