my own unify - raid5 setup, is it possible?

casper at bcx.nl (Casper Langemeijer) · Mon, 15 Feb 2010 12:38:50 +0100

Hi List!

Raghavendra G, thanks for you reply.

On Mon, 2010-02-15 at 11:11 +0400, Raghavendra G wrote:
>         Could I duplicate files to multiple data bricks in the cluster
>         to
>         provide a raid5-like setup? I very much want to be able to
>         shutdown a single machine in the cluster and still have a
>         fully functional filesystem. I'm very happy to write the
>         application that does the copying over the data bricks myself.

> We recommend using distribute translator instead of unify. But with
> distribute (even with unify) data is not striped. Both translators
> (unify and distribute) are used to aggregate multiple storage nodes
> into a single filesystem. If you want to increase read performance
> using stripe, you can use stripe translator.

If I get this correctly: Using both stripe and distribute, I can create
a redundant distributed filesystem. Basically a networked RAID10. (or 1
+0, 01 0+1 thats details)

I'm looking for a networked RAID5-like system though.

>         Another advantage could be that I can decide on a per-file
>         basis how many copies of a file exist in the filesystem. (Two
>         would be a minimum for me) The real-world scenario: This would
>         be the data filesystem for a webserver cluster setup. You can
>         imagine images used on a homepage are requested more frequent
>         than others.
> 
> replicate (formerly known as afr) does not  support maintaining
> different number of replicas for different files.

I know I'm doing something that unify was not intended for. I did some
simple tests. My two data bricks unified by two clients, subvolumes
specified in a different order. (client1 has 'subvolumes data1 data2',
client2 has 'subvolumes data2 data1')

Reading works. I confimed that unify reads form the first data brick 
available. It remembers what brick a file is on. Once a file is found to
be on data1, it won't change to data2.

Removing works. Files are not only removed from the namespace brick, but
also from every client. No stale data is left behind.

Renaming works. Similar to remove, files are renamed on all bricks.

Modifying doesn't work. My simple tests showed one copy to be modified,
others got truncated. I'll investigate later on. If I can get my
application to not modify data, but instead do a create tmp, remove old
and rename tmp to old cycle. I might be there.

It seems that although it's not meant to work this way, I found my
networked RAID5-like system, as long as I'm willing to create copies of
files to other bricks myself. I very much understand that I won't get
any guarantees.

>         What problems can I expect with this setup?
>         Have others tried a similar setup?
>         Am I missing a GlusterFS feature that would implement what I
>         want, in a much easier way?

I think I've got the answer to last question. GlusterFS provides a
raid10-like, but nothing like a raid5-like setup.

I would still like to know if I'm missing stuff here. I haven't thought
of any performance issues for example.

Also: I'm just starting using unify, and already using an
Obsolete/legacy translator. For me switching to cluster/distribute is
not an option. Does that mean I'll be locked-in to GlusterFS 2.0.9?

Greetings, Casper