RE: Problem with clients that goes down..

Antonio González <antonio.gonzalez@xxxxxxxxxx> · Mon, 21 Apr 2008 14:30:23 +0200

Thanks Krishna, dont worry for not respond, i think is a hard work to
maintain this list!!!

Well, the main problem is the first you note. I have made some test over
glusters to check the viability when client goes down, I can see that some
times if a client hangs while making any operation (read/write) other
clients don?t work correctly. 

I proved this issue in several scenarios, and I can see this problem always.
Mi last test can explain you the problem. I have 4 machines, two servers and
to clients. 

One server export one brick for storage (posix storage), the other server
exports a brick for namespace and a brick for storage. The unify translator
is place at client side. 

The test is: From one client I cp a file (from local to glusters and vice
versa) while the client is completing the cp I power down the client, then
from other client I try a "ls" command (I proved also sha1sum over a file in
the Gluster, cp, cat ...),  the client finishes blocked during a large time.
Some times finish the command (for example "ls" 2/3 minutes) and other times
send an error message. 

Note: some times the client is not blocked and the gluster works fine. Is
difficult to prevent when the client will be blocked and when no.

As I comment previously I test this issue with several scenarios, with and
without AFR (I think the problem is because unify translator), the unify
translator at the client side and at the server side, one server and two
clients, 2 server and 2 clients, 3 server and two clients.

The issue about timeout option is related about this problem. I test with
the timeout option to see the impact over the same tests. I can see that if
I define a timeout, when a client try a ls command (or cp, sha1sum ..) the
recovery time is less than if I not define timeout. I don?t know the
relation about this, but it seems that with timeout the client when the
timeout expire try the command other time and this time the command finish
successfully but I don?t sure about this. 

The config files of this last test: 

Server1

volume brick

      type storage/posix

      option directory /home/pruebaD

end-volume

volume brick-ns

      type storage/posix

      option directory /home/namespace

end-volume

volume server

      type protocol/server

      subvolumes brick brick-ns

      option transport-type tcp/server

      option auth.ip.brick.allow *

      option auth.ip.brick-ns.allow *

      option listen-port 6996                # Default is 6996

      option client-volume-filename
etc/glusterfs/pruebaDistribuida/glusterfs-client.vol

end-volume

Sever2

volume brick

      type storage/posix

      option directory /home/pruebaD

end-volume

volume server

      type protocol/server

      subvolumes brick 

      option transport-type tcp/server

      option auth.ip.brick.allow *

end-volume

Clients 

volume brick1

      type protocol/client

      option transport-type tcp/client

      option remote-host 10.1.0.45

      option remote-subvolume brick

end-volume

volume brick2

      type protocol/client

      option transport-type tcp/client

      option remote-host 10.1.0.40

      option remote-subvolume brick

end-volume

volume ns1

      type protocol/client

      option transport-type tcp/client

      option remote-host 10.1.0.45

      option remote-subvolume brick-ns

end-volume

volume unify

      type cluster/unify

      subvolumes brick1 brick2

      option namespace ns1

      option scheduler rr

end-volume

The version of glusters is 1.3.8pre5, fuse 2.7.2glfs9. The OS is gentoo
kernel 2.6.23-r6.

Thanks for the reply, 

-----Mensaje original-----
De: krishna.srinivas@xxxxxxxxx [mailto:krishna.srinivas@xxxxxxxxx] En nombre
de Krishna Srinivas
Enviado el: lunes, 21 de abril de 2008 13:09
Para: Antonio González
CC: gluster-devel@xxxxxxxxxx
Asunto: Re: Problem with clients that goes down..

Hi Antonio,

Excuse us, somehow your issue was not responded to.

If I understand correctly, you are facing two problems:

1) plugging out the cable on one client will make other clients hang

2) the timeout value you specify in spec file does not reflect

   in the actual timeout you see when you access glusterfs.

Is that correct? I have lost track of your setup details. Searching mail

archives did not give me the exact picture. Can you give the setup

details with config files? And also the tests?

Surely the problem you are facing should be fixed.

Regards

Krishna

On Mon, Apr 21, 2008 at 3:58 PM, Antonio González

<antonio.gonzalez@xxxxxxxxxx> wrote:

> Hello all,

> 

> 

> 

>  I have made a lot of tests over GlusterFS to verify his viability. I
wrote

>  at this list one or two weeks ago asking about an issue with clients that

>  goes down and causes problems with other clients that can not access to
the

>  Gluster file system.

> 

> 

> 

>  Are the developers of GlusterFS noticed about this issue?  I think that
is a

>  serious problem and I need an answer to advice or not the use of
GlusterFS

>  in a project.

> 

> 

> 

>  I proved this issue over several scenarios (AFR/unify at server side,
client

>  side, without AFR?), and I think that the problem is the unify
translator.

>  I made a test with one server and two clients. Without unify translator

>  works fine, a client who goes down while reads or copy a file, don't
affect

>  other clients. With the unify translator, if a client who reads/writes
file

>  goes down causes the problem (other clients that tries an "ls" command
are

>  blocked).

> 

> 

> 

>  I made a test with two servers (without AFR, unify at client side), I
have

>  localized files in each server, I try to block one server and access to a

>  file in the other server (cp command). I can see that the access to this

>  server (no blocked) is in function of the timeout option. If I don't set

>  timeout, the client takes 2 or 3 minutes and not finishes the command. If
I

>  set a timeout of 20 sec the client takes 32 sec and finishes the command.

>  For a timeout of 40 s. the client takes 60 sec approximately.

> 

> 

> 

> 

> 

>  I would like to know at least if this problem is recognized by the

>  developers of Gluster. They know which is problem?  They working to solve

>  it? .

> 

> 

> 

>  Thanks,

> 

>  _______________________________________________

>  Gluster-devel mailing list

>  Gluster-devel@xxxxxxxxxx

>  http://lists.nongnu.org/mailman/listinfo/gluster-devel

> 

<http://www.libera.net/correoweb/redir.php?https://www.plaxo.com/add_me?u=51
540170138&v0=1125188&k0=1660502549> 

 <http://www.libera.net/correoweb/redir.php?http://www.plaxo.com/signature>