Hello Daniel, can you re-try to crash the setup after adding option ping-timeout 120 to your client config for every server. I remember such crashes in 2.X versions with bonnie tests months ago. Please tell about your experiences. Thanks. Regards, Stephan On Thu, 04 Feb 2010 16:43:29 +0100 Daniel Maher <dma+gluster@xxxxxxxxx> wrote: > > Hello, > > I managed to crash Gluster 3.0.0 severely during a simple file creation > test. Not only did the crash result in the standard « transport > endpoint not connected » problem, but the servers in question had to be > hard-reset in order to make them operational again. > > So, here goes... > > 4 nodes, two servers, two clients, client-side replication. Clients are > Fedora 8, servers are Fedora 9. Stock FUSE used throughout. > Configurations generated with the volgen tool using the following > commandline : > > # glusterfs-volgen --name replicated --raid 1 s01:/opt/gluster > s02:/opt/gluster > > Servers : > # service glusterfsd start > > Clients : > # mount -t glusterfs /etc/glusterfs/replicated-tcp.vol /opt/gluster/ > > The following Python script was used to run the file creation test : > http://nfsv4.bullopensource.org/tools/tests_tools/test_files.py > > The Python script was edited only to point the target directory to the > Gluster mount. Each client was told to use a different sub-directory > within the Gluster mount point. > > This script was used in the context of a bash looping script, which is > as follows : > #!/bin/bash > LOOP=0 > while [ $LOOP -lt 1000 ] > do > time ./test_files.py | tee -a go_test_files.log > cat ./test_files_orw | tee -a go_test_files.log > let LOOP=$LOOP+1 > done > > « test_files_orw » is the file that test_files.py outputs to. It is > over-written on each run (hence the redirect). > > The script made it through 20 or so iterations before Gluster crashed. > The servers responded to ping requests, but no new SSH connections could > be made. Existing sessions open via SSH were frozen. On the local > console, keyboard interactions were still possible, but no new actions > could be taken. The servers were hard-reset at this point. > > I'll be happy to provide any further information as is deemed necessary > - just let me know. > > > -- > Daniel Maher <dma+gluster AT witbe DOT net> > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >