Re: GlusterFS and failure domains?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6 July 2014 19:17, Vijay Bellur <vbellur@xxxxxxxxxx> wrote:
On 07/01/2014 05:13 PM, Jonathan Barber wrote:
Hello all,

I'm investigating GlusterFS+Swift for use in a "large" (starting at
~150TB) scale out file system for storing and serving photographic images.

Currently I'm thinking of using servers with JBODs and it's clear how to
use Gluster's replication sets to give resiliency at the server level.
However, I'd like to have multiple bricks per server (with a brick per
drive controller) and managing the replication sets starts to look more
complicated from a management point of view. Also,when it comes to
expanding the solution in the future, I reckon that I will be adding
bricks of different sizes with different numbers of bricks per server -
further complicating management.

So, I was wondering if there is support for (or plans for) failure
domains (like Oracle's ASM failure groups) which would allow you to
describe groups of bricks within which replicas can't be co-located?
(e.g. bricks from the same server are placed in the same failure domain,
meaning that no-two replicas are allowed on these groups of bricks).


A warning does already get displayed in CLI when bricks from the same server are attempted to be part of the same replica set:

[root@deepthought lo]# gluster volume create myvol replica 2 deepthought:/d/brick1 deepthought:/d/brick2
Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Do you still want to continue creating the volume?  (y/n) n
Volume create failed

Yes, I'd seen that warning. It isn't reported when adding bricks to an existing volume though, e.g.:

# gluster volume add-brick myvol $HOSTNAME:/d/brick{1,2}
volume add-brick: success
#

(with Gluster 3.5.1 from the gluster-epel repo)
 

What other policies would you be interested in associating with failure domains?

I was also thinking about failure domains that span hosts (perhaps because some of the machines in a volume share a single point of failure such as a top of rack switch, the same UPS, or are in the same room). It would also then be possible to have a brick per drive and so not need RAID in the server (if we have cross-server replication, I don't think additional replication from RAID is necessary).

It'd also be nice to able to just say "I want 2 replicas, but I don't care which bricks they are on as long as they aren't in the same failure domain". This could let you have odd numbers of servers without having to manually slice the storage and place the replicas.

Obviously, I have no idea about the internals of Gluster so I don't know how complicated this is to achieve.

Cheers

Regards,
Vijay






--
Jonathan Barber <jonathan.barber@xxxxxxxxx>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux