Thanks Krishna, dont worry for not respond, i think is a hard work to maintain this list!!! Well, the main problem is the first you note. I have made some test over glusters to check the viability when client goes down, I can see that some times if a client hangs while making any operation (read/write) other clients don?t work correctly. I proved this issue in several scenarios, and I can see this problem always. Mi last test can explain you the problem. I have 4 machines, two servers and to clients. One server export one brick for storage (posix storage), the other server exports a brick for namespace and a brick for storage. The unify translator is place at client side. The test is: From one client I cp a file (from local to glusters and vice versa) while the client is completing the cp I power down the client, then from other client I try a "ls" command (I proved also sha1sum over a file in the Gluster, cp, cat ...), the client finishes blocked during a large time. Some times finish the command (for example "ls" 2/3 minutes) and other times send an error message. Note: some times the client is not blocked and the gluster works fine. Is difficult to prevent when the client will be blocked and when no. As I comment previously I test this issue with several scenarios, with and without AFR (I think the problem is because unify translator), the unify translator at the client side and at the server side, one server and two clients, 2 server and 2 clients, 3 server and two clients. The issue about timeout option is related about this problem. I test with the timeout option to see the impact over the same tests. I can see that if I define a timeout, when a client try a ls command (or cp, sha1sum ..) the recovery time is less than if I not define timeout. I don?t know the relation about this, but it seems that with timeout the client when the timeout expire try the command other time and this time the command finish successfully but I don?t sure about this. The config files of this last test: Server1 volume brick type storage/posix option directory /home/pruebaD end-volume volume brick-ns type storage/posix option directory /home/namespace end-volume volume server type protocol/server subvolumes brick brick-ns option transport-type tcp/server option auth.ip.brick.allow * option auth.ip.brick-ns.allow * option listen-port 6996 # Default is 6996 option client-volume-filename etc/glusterfs/pruebaDistribuida/glusterfs-client.vol end-volume Sever2 volume brick type storage/posix option directory /home/pruebaD end-volume volume server type protocol/server subvolumes brick option transport-type tcp/server option auth.ip.brick.allow * end-volume Clients volume brick1 type protocol/client option transport-type tcp/client option remote-host 10.1.0.45 option remote-subvolume brick end-volume volume brick2 type protocol/client option transport-type tcp/client option remote-host 10.1.0.40 option remote-subvolume brick end-volume volume ns1 type protocol/client option transport-type tcp/client option remote-host 10.1.0.45 option remote-subvolume brick-ns end-volume volume unify type cluster/unify subvolumes brick1 brick2 option namespace ns1 option scheduler rr end-volume The version of glusters is 1.3.8pre5, fuse 2.7.2glfs9. The OS is gentoo kernel 2.6.23-r6. Thanks for the reply, -----Mensaje original----- De: krishna.srinivas@xxxxxxxxx [mailto:krishna.srinivas@xxxxxxxxx] En nombre de Krishna Srinivas Enviado el: lunes, 21 de abril de 2008 13:09 Para: Antonio González CC: gluster-devel@xxxxxxxxxx Asunto: Re: Problem with clients that goes down.. Hi Antonio, Excuse us, somehow your issue was not responded to. If I understand correctly, you are facing two problems: 1) plugging out the cable on one client will make other clients hang 2) the timeout value you specify in spec file does not reflect in the actual timeout you see when you access glusterfs. Is that correct? I have lost track of your setup details. Searching mail archives did not give me the exact picture. Can you give the setup details with config files? And also the tests? Surely the problem you are facing should be fixed. Regards Krishna On Mon, Apr 21, 2008 at 3:58 PM, Antonio González <antonio.gonzalez@xxxxxxxxxx> wrote: > Hello all, > > > > I have made a lot of tests over GlusterFS to verify his viability. I wrote > at this list one or two weeks ago asking about an issue with clients that > goes down and causes problems with other clients that can not access to the > Gluster file system. > > > > Are the developers of GlusterFS noticed about this issue? I think that is a > serious problem and I need an answer to advice or not the use of GlusterFS > in a project. > > > > I proved this issue over several scenarios (AFR/unify at server side, client > side, without AFR?), and I think that the problem is the unify translator. > I made a test with one server and two clients. Without unify translator > works fine, a client who goes down while reads or copy a file, don't affect > other clients. With the unify translator, if a client who reads/writes file > goes down causes the problem (other clients that tries an "ls" command are > blocked). > > > > I made a test with two servers (without AFR, unify at client side), I have > localized files in each server, I try to block one server and access to a > file in the other server (cp command). I can see that the access to this > server (no blocked) is in function of the timeout option. If I don't set > timeout, the client takes 2 or 3 minutes and not finishes the command. If I > set a timeout of 20 sec the client takes 32 sec and finishes the command. > For a timeout of 40 s. the client takes 60 sec approximately. > > > > > > I would like to know at least if this problem is recognized by the > developers of Gluster. They know which is problem? They working to solve > it? . > > > > Thanks, > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > <http://www.libera.net/correoweb/redir.php?https://www.plaxo.com/add_me?u=51 540170138&v0=1125188&k0=1660502549> <http://www.libera.net/correoweb/redir.php?http://www.plaxo.com/signature>