Re: replicate with 5 nodes - and adding more nodes in the future

"Shai DB" <dbshai@xxxxxxxxx> · Wed, 27 Jun 2007 09:02:24 +0300

after copying some few thousands files and deleting and copying again
i get a lot of errors:
File descriptor in bad state
No such file or directory

and a lot of
[Jun 26 05:45:13] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
connection to server disconnected
[Jun 26 05:45:13] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw:
0 bytes r/w instead of 113 (errno=9)

in the glusterd.log

I have set it up like this:

1.3-pre4

5 servers + 5 clients (running on same boxes as servers).
what could cause the disconnection ?

server:

volume gfs
 type storage/posix
 option directory /mnt/gluster/gfs1
end-volume

volume gfs-afr
 type storage/posix
 option directory /mnt/gluster/afr-gfs1
end-volume

volume server
 type protocol/server
 option transport-type tcp/server
option listen-port 6996
 subvolumes gfs gfs-afr
 option auth.ip.gfs.allow *
 option auth.ip.gfs-afr.allow *
end-volume

client:
volume gfs
 type storage/posix
 option directory /mnt/gluster/gfs1
end-volume

volume gfs-afr
 type storage/posix
 option directory /mnt/gluster/afr-gfs1
end-volume

volume server
 type protocol/server
 option transport-type tcp/server
option listen-port 6996
 subvolumes gfs gfs-afr
 option auth.ip.gfs.allow *
 option auth.ip.gfs-afr.allow *
end-volume
[root@hd-t1157cl etc]# cat cluster-client.vol
volume a1
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.10
 option remote-port 6996
 option remote-subvolume gfs
end-volume

volume a2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.10
 option remote-port 6996
 option remote-subvolume gfs-afr
end-volume

volume b1
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.11
 option remote-port 6996
 option remote-subvolume gfs
end-volume

volume b2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.11
 option remote-port 6996
 option remote-subvolume gfs-afr
end-volume

volume c1
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.12
 option remote-port 6996
 option remote-subvolume gfs
end-volume

volume c2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.12
 option remote-port 6996
 option remote-subvolume gfs-afr
end-volume

volume d1
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.13
 option remote-port 6996
 option remote-subvolume gfs
end-volume

volume d2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.13
 option remote-port 6996
 option remote-subvolume gfs-afr
end-volume

volume e1
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.14
 option remote-port 6996
 option remote-subvolume gfs
end-volume

volume e2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.47.0.14
 option remote-port 6996
 option remote-subvolume gfs-afr
end-volume

volume afr1
 type cluster/afr
 subvolumes a1 e2
 option replicate *:2
end-volume

volume afr2
 type cluster/afr
 subvolumes b1 d2
 option replicate *:2
end-volume

volume afr3
 type cluster/afr
 subvolumes c1 a2
 option replicate *:2
end-volume

volume afr4
 type cluster/afr
 subvolumes d1 b2
 option replicate *:2
end-volume

volume afr5
 type cluster/afr
 subvolumes e1 c2
 option replicate *:2
end-volume

volume gfstest
 type cluster/unify
 subvolumes afr1 afr2 afr3 afr4 afr5
 option scheduler rr
 option rr.limits.min-free-disk 5GB
end-volume

On 6/26/07, Sebastien LELIEVRE <slelievre@xxxxxxxxxxxxxxxx> wrote:

Hi again !

Shai DB a écrit :
> another question
> I notice that 1.2 dont have the AFR on its source
> how can i use/install it anyway ?
> i saw 1.3-pre has it..
> is the 1.3-pre OK for production ?
> thanks
>

I had forgotten this point ! :)

Yes, 1.3-pre4 archive is stable enough for production, but you can also
use the tla repository with the branch 2.4 which is stable enough (to
me) to be used in production.

Just note that the 1.3 stable release will be based on the 2.5
mainbranch and will include self-heal feature (and many more !)

Cheers,

Sebastien LELIEVRE
slelievre@xxxxxxxxxxxxxxxx           Services to ISP
TBS-internet                   http://www.TBS-internet.com

> I need it for replication (to have 2 copies of data in case of crash)
>
>
> On 6/26/07, *Sebastien LELIEVRE* <slelievre@xxxxxxxxxxxxxxxx
> <mailto:slelievre@xxxxxxxxxxxxxxxx>> wrote:
>
>     Hi,
>
>     I just wanted to stress this :
>
>     Shai a écrit :
>     > Hello, we are testing glusterfs 1.2 and I have few questions -
>
>     1.2 doesn't bring "self-heal" with it, so keep in mind that if a
drives
>     crashes, you would have to sync your new drive "manually" with the
>     others.
>
>
> so to just copy all data to the replaced disk from his afr 'pair' ?
>
>
>     BUT, 1.3 is going to correct this, and this is good :)
>
>     That's all I had to add
>
>     Cheers,
>
>     Sebastien LELIEVRE
>     slelievre@xxxxxxxxxxxxxxxx
>     <mailto:slelievre@xxxxxxxxxxxxxxxx>           Services to ISP
>     TBS-internet                   http://www.TBS-internet.com
>
>     Krishna Srinivas a écrit :
>     > As of now you need to restart glusterfs if there is any change
>     > in the config spec file. However in future versions you wont need
>     > to remount (This is in our road map)
>     >
>     > On 6/25/07, Shai DB <dbshai@xxxxxxxxx <mailto:dbshai@xxxxxxxxx>>
>     wrote:
>     >> thanks for the answer
>     >> this seems easy and neat to setup
>     >>
>     >> another question is, if i add 2 more nodes to the gang
>     >> how can i setup all the clients with the new configuration,
without
>     >> need to
>     >> 'remount' the glusterfs ?
>     >>
>     >> Thanks
>     >>
>     >>
>     >> On 6/25/07, Krishna Srinivas <krishna@xxxxxxxxxxxxx
>     <mailto:krishna@xxxxxxxxxxxxx>> wrote:
>     >> >
>     >> > On 6/25/07, Shai DB < dbshai@xxxxxxxxx
>     <mailto:dbshai@xxxxxxxxx>> wrote:
>     >> > > Hello, we are testing glusterfs 1.2 and I have few questions
-
>     >> > >
>     >> > >
>     >> > > 1. we are going to store millions of small jpg files that
>     will be
>     >> read
>     >> > by
>     >> > > webserver - is glusterfs good solution for this ?
>     >> >
>     >> > Yes, definitely.
>     >> >
>     >> > > 2. we are going to run both server+clients on each node
>     together with
>     >> > apache
>     >> > >
>     >> > > 3. replicate *:2
>     >> > >
>     >> > > the way i think doing replicate is defining on each server 2
>     >> volumes and
>     >> > > using AFR:
>     >> > >
>     >> > > server1: a1, a2
>     >> > > server2: b1, b2
>     >> > > server3: c1, c2
>     >> > > server4: d1, d2
>     >> > > server5: e1, e2
>     >> > >
>     >> > > afr1: a1+b2
>     >> > > afr2: b1+c2
>     >> > > afr3: c1+d2
>     >> > > afr4: d1+e2
>     >> > > afr5: e1+a2
>     >> > >
>     >> > > and then unify = afr1+afr2+afr3+afr4+afr5 with replicate
option
>     >> > >
>     >> > > is this correct way ?
>     >> > > and what to do on the future when we add more nodes ? when
>     >> changing the
>     >> > afr
>     >> > > (adding and changing the couples) making glusterfs
>     >> > > redistribute the files the new way ?
>     >> >
>     >> > Yes this is the right way. If you add one more server f, the
one
>     >> solution
>     >> > is to move contents of a2 to f2 and clean up a2 and have it as
>     >> following:
>     >> >
>     >> > afr5: e1 + f2
>     >> > afr6: f1 + a2
>     >> >
>     >> > Cant think of an easier solution.
>     >> >
>     >> > But if we assume that you will always add 2 servers when you
>     want to
>     >> add,
>     >> > we can have the setup in following way:
>     >> > afr1: a1 + b2
>     >> > afr2: b1 + a2
>     >> > afr3: c1 + d2
>     >> > afr4: d1 + c2
>     >> > afr5: e1 + f2
>     >> > afr6: f1 + e2
>     >> >
>     >> > Now when you add a pair of servers to this (g, h):
>     >> > afr7: f1 + h2
>     >> > afr8: h1 +f2
>     >> >
>     >> > Which is very easy. But you will have to add 2 servers
everytime.
>     >> > The advantage is that it is easier to visualize the setup and
add
>     >> > new nodes.
>     >> >
>     >> > Thinking further, if we assume that you will replicate all the
>     files
>     >> > twice (option replicate *:2) you can have the following setup:
>     >> > afr1: a + b
>     >> > afr2: c + d
>     >> > afr3: e + f
>     >> >
>     >> > This is a very easy setup. It is simple to add a fresh pair
>     (afr4: g
>     >> +h)
>     >> >
>     >> > You can have whatever setup you want depending on your
>     >> > convinience and requirement.
>     >> >
>     >> > >
>     >> > > 4. what happens when a hard drive goes down and replaces, the
>     cluster
>     >> > also
>     >> > > redistribute the files ?
>     >> >
>     >> > When a hard drive is replaced, missing files will be replicated
>     from
>     >> the
>     >> > AFR's other child.
>     >> >
>     >> > Regards
>     >> > Krishna
>     >> >
>     >> > -------
>     >> >
>     >> > The best quote ever : '
>     >> >