Re: a union to two stripes to fourteen mirrors...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I think it's more of a RAID 10 or 0+1. Since AFR is being used, all data is being stored twice (at a minimum, for all redundant files). Only AFRing 10 of 50 bricks would mean that 25% (10 of 40) bricks were redundant, and the data they contained would be available on another brick, whether it was a stripe of a file or a full file. The other 75% of the bricks would be single points of failure.

RAID 5 actually uses parity, and GlusterFS doesn't have any method for doing parity currently that I know of (although that would be a nice translator, or nice addition to the stripe translator).

Onyx wrote:
Wow, very interesting concept!
Never thought of it...
Kind of like a raid5 over a network, right?

Just thinking out loud now, not sure if this is correct, but...
- In your setup, any single brick can fail like with raid5
- If you afr 2 times (3 copies), any 2 bricks can fail like with raid6
- If you afr n times, any n bricks can fail.

So you can setup a cluster with 50 bricks, afr 10 times, have a redundancy of 10 bricks, and usable storage space of 40 bricks....
A complex but very interesting concept!
....
....AND... We could setup some detection system and other small intelligence in the cluster to start a spare brick with the configuration of the failed brick. BAM, hotspare brick alive, and starting to auto-heal!

Man Glusterfs is flexible!

Can someone confirm if my thinking is not way-off here?


This makes me think of an other young cluster filesystem....


Jerker Nyberg wrote:

Hi,

I'm trying out different configurations of GlusterFS. I have 7 nodes each with two 320 GB disks where 300 GB om each disk is for the distributed file system.

Each node is called N. Every file system is on the server side mirrored to the other disk on the next node, wrapped around so that the last node mirrors its disk to the first. invented. The real config is included in the end of this mail.

Pseudodefinitions:

fs(1) = a file system on the first disk
fs(2) = a file system on the second disk
n(I, fs(J)) = the fs J on node I
afr(N .. M) = mirror the volumes
stripe(N .. M) = stripe the volumes

Server:

Forw(N) = afr(fs(1), node(N+1, fs(2))
Back(N) = afr(fs(2), node(N-1, fs(1))

Client:

FStr(N .. M) = stripe(n(N, Forw(N)) .. n(N+i, Forw(N+1)) .. n(M, Forw(M))) BStr(N .. M) = stripe(n(N, Back(N)) .. n(N+i, Back(N+1)) .. n(M, Back(M)))
mount /glusterfs = union(FStr(1 .. 7), BStr(1..7))



The goal was to get good performance but also redundancy. But this setup will not will it? The stripes will not work when a part of is gone and the union will not not magically find the other part of a file on the other stripe? And where to put the union namespace for good performance?

But my major question is this: I tried to stripe a single stripe (not using union on the client, just striping on the servers which in turn mirrored) When rsync'ing in data on it on a single server things worked fine, but when I put some load on it from the other nodes (dd'ing in and out some large files) the glusterfsd's on the first server died... Do you want me to check this up more and try to reproduce and narrow down the problem, or is this kind of setup in general not a good idea?

Regards
Jerker Nyberg.

### client config

# remote slices
volume brick2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.0.0.2
  option remote-subvolume brick
end-volume
volume brick3
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.0.0.3
  option remote-subvolume brick
end-volume
volume brick4
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.0.0.4
  option remote-subvolume brick
end-volume
volume brick5
  type protocol/client
  option transport-type tcp/client     # for TCP/IP transport
  option remote-host 10.0.0.5
  option remote-subvolume brick
end-volume
volume brick6
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.0.0.6
  option remote-subvolume brick
end-volume
volume brick7
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.0.0.7
  option remote-subvolume brick
end-volume
volume brick8
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.0.0.8
  option remote-subvolume brick
end-volume
volume stripe
  type cluster/stripe
  subvolumes brick2 brick3 brick4 brick5 brick6 brick7 brick8
  option block-size *:32KB
end-volume
### Add iothreads
volume iothreads
   type performance/io-threads
   option thread-count 32  # deault is 1
   option cache-size 64MB #64MB
   subvolumes stripe
end-volume
### Add readahead feature
volume readahead
  type performance/read-ahead
  option page-size 256kB     # unit in bytes
# option page-count 20 # cache per file = (page-count x page-size)
  option page-count 10       # cache per file  = (page-count x page-size)
  subvolumes iothreads
end-volume
### Add IO-Cache feature
volume iocache
  type performance/io-cache
  option page-size 256KB
#  option page-size 100MB
  option page-count 10
  subvolumes readahead
end-volume
### Add writeback feature
volume writeback
  type performance/write-behind
  option aggregate-size 1MB
  option flush-behind off
  subvolumes iocache
end-volume

### server config for the 10.0.0.2

# posix
volume ba
  type storage/posix
  option directory /hda/glusterfs-a
end-volume
volume bc
  type storage/posix
  option directory /hdc/glusterfs-c
end-volume
# remote mirror
volume mc
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.0.0.3 # the next node
  option remote-subvolume bc
end-volume
# join
volume afr
        type cluster/afr
        subvolumes ba mc
end-volume
# lock
volume pl
  type features/posix-locks
  subvolumes afr
end-volume
# threads
volume brick
   type performance/io-threads
   option thread-count 16  # deault is 1
   option cache-size 128MB #64MB
   subvolumes pl
end-volume
# export
volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes brick, bc
  option auth.ip.brick.allow *
  option auth.ip.bc.allow *
end-volume



# glusterfs --version
glusterfs 1.3.8 built on Nov 16 2007
Copyright (c) 2006, 2007 Z RESEARCH Inc. <http://www.zresearch.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.




_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel
.



--

-Kevan Benson
-A-1 Networks




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux