Re: clustered MD - beyond RAID1

NeilBrown <neilb@xxxxxxx> · Tue, 22 Dec 2015 10:09:16 +1100

On Tue, Dec 22 2015, Adam Goryachev wrote:

> On 22/12/15 09:03, NeilBrown wrote:
>> On Tue, Dec 22 2015, Tejas Rao wrote:
>>
>>> On 12/21/2015 15:47, NeilBrown wrote:
>>>> On Tue, Dec 22 2015, Tejas Rao wrote:
>>>>
>>>>> What if the application is doing the locking and making sure that only 1
>>>>> node writes to a md device at a time? Will this work? How are rebuilds
>>>>> handled? This would be helpful with distributed filesystems like
>>>>> GPFS/lustre etc.
>>>>>
>>>> You would also need to make sure that the filesystem only wrote from a
>>>> single node at a time (or access the block device directly).  I doubt
>>>> GPFS/lustre make any promise like that, but I'm happy to be educated.
>>>>
>>>> rebuilds are handled by using a cluster-wide lock to block all writes to
>>>> a range of addresses while those stripes are repaired.
>>>>
>>>> NeilBrown
>
> My understanding of MD level cross host RAID was that it would not 
> magically create cluster aware filesystems out of non-cluster aware 
> filesystems. ie, you wouldn't be able to use the same multi-host RAID 
> device on multiple hosts concurrently with ext3.

This is correct.  The expectation is that clustered md/raid1 would be
used with a cluster-aware filesystem such as ocfs2 or gpfs.  Certainly
not with ext3 or similar.

>
> IMHO, if it was able to behave similar to DRBD, then that would be 
> perfect (ie, enforce only a single node can write at a time (unless you 
> specifically set it for multi-node write)). The benefit should be that 
> you can lose a node without losing your data. After you lose that node, 
> you can then "do something" to use the remaining node to access the data 
> (eg, mount it, export with iscsi/nfs, etc).

There is a lot of similarity between DRBD and clustered md/raid1.
I don't know the current state of DRBD but it initially assumed each
storage device was local to a single node and so sent data over the
network (i.e. over IP) to "remote" devices.

clustered md/raid1 assumes that all storage is equally accessible to all
nodes (over a 'storage area network', which may still be IP).

So yes: if you lose a node you should not lose functionality.

>
> Currently, this is what I use DRBD for, previously, I've used NBD + MD 
> RAID1 to do the same thing. One question though is what advantage 
> multi-host MD RAID might have over the existing in-kernel DRBD ? Are 
> there plans which show why this is going to be better, have better 
> performance, features, etc?

I'm not the driving force behind clustered md/raid1 so I am not
completely familiar with the motivation, but I believe DRBD doesn't, or
didn't, make best possible use of the storage network when every storage
device is connected to every compute node.  It is expected that clustered
md/raid1 will.

I *think* DRBD is primarily for pair of nodes (though there is some
multi-node support).  clustered md/raid1 is designed to work with
multiple nodes - however big your cluster is.
(DRBD 9.0 appears to support multi-node configurations.  I haven't
researched the details)

NeilBrown
Attachment:
signature.asc

Description: PGP signature