Your description says that you are powering the client down. I will try to reproduce this bug and get back to you. Krishna On Mon, Apr 21, 2008 at 6:22 PM, Krishna Srinivas <krishna@xxxxxxxxxxxxx> wrote: > One doubt, are you sure you are not stopping the server on which > the namespace is there? > > On Mon, Apr 21, 2008 at 6:00 PM, Antonio González > > > <antonio.gonzalez@xxxxxxxxxx> wrote: > > Thanks Krishna, dont worry for not respond, i think is a hard work to > > maintain this list!!! > > > > > > > > Well, the main problem is the first you note. I have made some test over > > glusters to check the viability when client goes down, I can see that some > > times if a client hangs while making any operation (read/write) other > > clients don't work correctly. > > > > > > > > I proved this issue in several scenarios, and I can see this problem always. > > Mi last test can explain you the problem. I have 4 machines, two servers and > > to clients. > > > > > > > > One server export one brick for storage (posix storage), the other server > > exports a brick for namespace and a brick for storage. The unify translator > > is place at client side. > > > > > > > > The test is: From one client I cp a file (from local to glusters and vice > > versa) while the client is completing the cp I power down the client, then > > from other client I try a "ls" command (I proved also sha1sum over a file in > > the Gluster, cp, cat ...), the client finishes blocked during a large time. > > Some times finish the command (for example "ls" 2/3 minutes) and other times > > send an error message. > > > > > > > > Note: some times the client is not blocked and the gluster works fine. Is > > difficult to prevent when the client will be blocked and when no. > > > > > > > > > > > > As I comment previously I test this issue with several scenarios, with and > > without AFR (I think the problem is because unify translator), the unify > > translator at the client side and at the server side, one server and two > > clients, 2 server and 2 clients, 3 server and two clients. > > > > > > > > The issue about timeout option is related about this problem. I test with > > the timeout option to see the impact over the same tests. I can see that if > > I define a timeout, when a client try a ls command (or cp, sha1sum ..) the > > recovery time is less than if I not define timeout. I don't know the > > relation about this, but it seems that with timeout the client when the > > timeout expire try the command other time and this time the command finish > > successfully but I don't sure about this. > > > > > > > > > > > > The config files of this last test: > > > > > > > > > > > > Server1 > > > > > > > > volume brick > > > > type storage/posix > > > > option directory /home/pruebaD > > > > end-volume > > > > > > > > volume brick-ns > > > > type storage/posix > > > > option directory /home/namespace > > > > end-volume > > > > > > > > > > > > volume server > > > > type protocol/server > > > > subvolumes brick brick-ns > > > > option transport-type tcp/server > > > > option auth.ip.brick.allow * > > > > option auth.ip.brick-ns.allow * > > > > option listen-port 6996 # Default is 6996 > > > > option client-volume-filename > > etc/glusterfs/pruebaDistribuida/glusterfs-client.vol > > > > end-volume > > > > > > > > > > > > Sever2 > > > > > > > > volume brick > > > > type storage/posix > > > > option directory /home/pruebaD > > > > end-volume > > > > > > > > volume server > > > > type protocol/server > > > > subvolumes brick > > > > option transport-type tcp/server > > > > option auth.ip.brick.allow * > > > > end-volume > > > > > > > > > > > > > > > > Clients > > > > > > > > volume brick1 > > > > type protocol/client > > > > option transport-type tcp/client > > > > option remote-host 10.1.0.45 > > > > option remote-subvolume brick > > > > end-volume > > > > > > > > volume brick2 > > > > type protocol/client > > > > option transport-type tcp/client > > > > option remote-host 10.1.0.40 > > > > option remote-subvolume brick > > > > end-volume > > > > > > > > > > > > volume ns1 > > > > type protocol/client > > > > option transport-type tcp/client > > > > option remote-host 10.1.0.45 > > > > option remote-subvolume brick-ns > > > > end-volume > > > > > > > > > > > > volume unify > > > > type cluster/unify > > > > subvolumes brick1 brick2 > > > > option namespace ns1 > > > > option scheduler rr > > > > end-volume > > > > > > > > > > > > > > > > The version of glusters is 1.3.8pre5, fuse 2.7.2glfs9. The OS is gentoo > > kernel 2.6.23-r6. > > > > > > > > > > > > > > > > Thanks for the reply, > > > > > > > > > > > > -----Mensaje original----- > > De: krishna.srinivas@xxxxxxxxx [mailto:krishna.srinivas@xxxxxxxxx] En nombre > > de Krishna Srinivas > > Enviado el: lunes, 21 de abril de 2008 13:09 > > Para: Antonio González > > CC: gluster-devel@xxxxxxxxxx > > Asunto: Re: Problem with clients that goes down.. > > > > > > > > > > > > Hi Antonio, > > > > > > > > Excuse us, somehow your issue was not responded to. > > > > > > > > If I understand correctly, you are facing two problems: > > > > 1) plugging out the cable on one client will make other clients hang > > > > 2) the timeout value you specify in spec file does not reflect > > > > in the actual timeout you see when you access glusterfs. > > > > > > > > Is that correct? I have lost track of your setup details. Searching mail > > > > archives did not give me the exact picture. Can you give the setup > > > > details with config files? And also the tests? > > > > > > > > Surely the problem you are facing should be fixed. > > > > > > > > Regards > > > > Krishna > > > > > > > > > > > > On Mon, Apr 21, 2008 at 3:58 PM, Antonio González > > > > <antonio.gonzalez@xxxxxxxxxx> wrote: > > > > > Hello all, > > > > > > > > > > > > > > > > > > > > I have made a lot of tests over GlusterFS to verify his viability. I > > wrote > > > > > at this list one or two weeks ago asking about an issue with clients that > > > > > goes down and causes problems with other clients that can not access to > > the > > > > > Gluster file system. > > > > > > > > > > > > > > > > > > > > Are the developers of GlusterFS noticed about this issue? I think that > > is a > > > > > serious problem and I need an answer to advice or not the use of > > GlusterFS > > > > > in a project. > > > > > > > > > > > > > > > > > > > > I proved this issue over several scenarios (AFR/unify at server side, > > client > > > > > side, without AFR…), and I think that the problem is the unify > > translator. > > > > > I made a test with one server and two clients. Without unify translator > > > > > works fine, a client who goes down while reads or copy a file, don't > > affect > > > > > other clients. With the unify translator, if a client who reads/writes > > file > > > > > goes down causes the problem (other clients that tries an "ls" command > > are > > > > > blocked). > > > > > > > > > > > > > > > > > > > > I made a test with two servers (without AFR, unify at client side), I > > have > > > > > localized files in each server, I try to block one server and access to a > > > > > file in the other server (cp command). I can see that the access to this > > > > > server (no blocked) is in function of the timeout option. If I don't set > > > > > timeout, the client takes 2 or 3 minutes and not finishes the command. If > > I > > > > > set a timeout of 20 sec the client takes 32 sec and finishes the command. > > > > > For a timeout of 40 s. the client takes 60 sec approximately. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would like to know at least if this problem is recognized by the > > > > > developers of Gluster. They know which is problem? They working to solve > > > > > it? . > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > _______________________________________________ > > > > > Gluster-devel mailing list > > > > > Gluster-devel@xxxxxxxxxx > > > > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <http://www.libera.net/correoweb/redir.php?https://www.plaxo.com/add_me?u=51 > > 540170138&v0=1125188&k0=1660502549> > > > > <http://www.libera.net/correoweb/redir.php?http://www.plaxo.com/signature> > > > > > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > >