gluster 3.0 read hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nick,

Thank you for using Gluster and sending us such detailed description of the problem you are seeing. We will try a run with exactly the same switches and config as you mention and see if we can reproduce this inhouse to make debugging easier.

Regards,
Tejas.

----- Original Message -----
From: "Nick Birkett" <nick at streamline-computing.com>
To: gluster-users at gluster.org
Sent: Wednesday, December 23, 2009 3:04:43 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi
Subject: gluster 3.0 read hangs

I ran some benchmarks last week using 2.0.8. Single server with 8 Intel 
e1000e  bonded  mode=balance-alb

All worked fine and I got some good results using 8 clients. All Gigabit.

The benchmarks did 2 passes of IOZONE  in network mode using 1-8 threads 
per client and using 1 - 8 clients. Each client used 32Gbyte files.

All jobs completed successfully. This takes about 32 hours to run 
through all cases.

Yesterday I updated to 3.0.0 (server and clients) and re-configured the 
server and client vol files using glusterfs-volgen (renamed some of the 
vol names).

RedHat EL5 binary packages from Glusterfs site installed
glusterfs-server-3.0.0-1.x86_64
glusterfs-common-3.0.0-1.x86_64
glusterfs-client-3.0.0-1.x86_64


All works mainly ok, except every so often the IOZONE job just stops. 
The network IO drops to zero.
This is always happens during either a read or re-read test. It happes 
just as the IOZONE read
test starts. It doesnt happen every time and it may run
for several hours without incident. This has happened 6 times on 
different test cases (thread/clients).



Anyone else noticed this ? Perhaps I have done something wrong ?

vol files attached  - I know I dont need to distribute 1 remote vol - 
part of larger test with multiple vols.

Attached sample outputs. 4 clients 4 files per client ran fine. 4 
clients 8 files per client hung at re-read
on 2nd pass of IOZONE. All jobs with 5 clients and 8 clients ran to 
completion.

Thanks,

Nick

 




This e-mail message may contain confidential and/or privileged information. If you are not an addressee or otherwise authorized to receive this message, you should not use, copy, disclose or take any action based on this e-mail or any information contained in the message.
If you have received this material in error, please advise the sender immediately by reply e-mail and delete this message. Thank you.
Streamline Computing is a trading division of Concurrent Thinking Limited: Registered in England and Wales No: 03913912
Registered Address: The Innovation Centre, Warwick Technology Park, Gallows Hill, Warwick, CV34 6UW, United Kingdom

 volume brick00.server-e
   type protocol/client
   option transport-type tcp
   option transport.socket.nodelay on
   option transport.remote-port 6996
   option remote-host 192.168.100.200 # can be IP or hostname
   option remote-subvolume brick00
 end-volume

 volume distribute
   type cluster/distribute
   subvolumes  brick00.server-e
 end-volume


volume writebehind
    type performance/write-behind
    option cache-size 4MB
    subvolumes distribute
end-volume

volume readahead
    type performance/read-ahead
    option page-count 4
    subvolumes writebehind
end-volume

volume iocache
    type performance/io-cache
    option cache-size 1GB
    option cache-timeout 1
    subvolumes readahead
end-volume

volume quickread
    type performance/quick-read
    option cache-timeout 1
    option max-file-size 64kB
    subvolumes iocache
end-volume

volume statprefetch
    type performance/stat-prefetch
    subvolumes quickread
end-volume

#glusterfsd_keep=0
 volume posix00
   type storage/posix
   option directory /data/data00
 end-volume

 volume locks00
   type features/locks
   subvolumes posix00
 end-volume

 volume brick00
   type performance/io-threads
   option thread-count 8
   subvolumes locks00
 end-volume

 volume server
   type protocol/server
   option transport-type tcp
   option transport.socket.listen-port 6996
   option transport.socket.nodelay on
   option auth.addr.brick00.allow *
   subvolumes  brick00
 end-volume

==========================================================
Cluster name         : Delldemo
Arch                 : x86_64 
SGE job submitted    : Tue Dec 22 22:21:38 GMT 2009
Number of CPUS 8 
Running Parallel IOZONE on ral03
Creating files in /data2/sccomp
NTHREADS=4
Total data size = 48196 MBytes
Running loop 1 of 2
	Iozone: Performance Test of File I/O
	        Version $Revision: 3.326 $
		Compiled for 64 bit mode.
		Build: linux 

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

	Run began: Tue Dec 22 22:21:38 2009

	Network distribution mode enabled.
	File size set to 12338176 KB
	Command line used: /opt/iozone/bin/iozone -+m hosts.556 -s 12049m -S 8192 -T -i 0 -i 1 -t 16 -F /data2/sccomp/BIG.0.comp03.streamline /data2/sccomp/BIG.1.comp03.streamline /data2/sccomp/BIG.2.comp03.streamline /data2/sccomp/BIG.3.comp03.streamline /data2/sccomp/BIG.0.ral02.streamline /data2/sccomp/BIG.1.ral02.streamline /data2/sccomp/BIG.2.ral02.streamline /data2/sccomp/BIG.3.ral02.streamline /data2/sccomp/BIG.0.ral03.streamline /data2/sccomp/BIG.1.ral03.streamline /data2/sccomp/BIG.2.ral03.streamline /data2/sccomp/BIG.3.ral03.streamline /data2/sccomp/BIG.0.ral04.streamline /data2/sccomp/BIG.1.ral04.streamline /data2/sccomp/BIG.2.ral04.streamline /data2/sccomp/BIG.3.ral04.streamline
	Output is in Kbytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 8192 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
	Throughput test with 16 threads
	Each thread writes a 12338176 Kbyte file in 4 Kbyte records

	Test running:
	Children see throughput for 16 initial writers 	=  424003.62 KB/sec
	Min throughput per thread 			=   26480.14 KB/sec 
	Max throughput per thread 			=   26517.04 KB/sec
	Avg throughput per thread 			=   26500.23 KB/sec
	Min xfer 					= 12321928.00 KB

	Test running:
	Children see throughput for 16 rewriters 	=  424109.61 KB/sec
	Min throughput per thread 			=   26483.30 KB/sec 
	Max throughput per thread 			=   26530.66 KB/sec
	Avg throughput per thread 			=   26506.85 KB/sec
	Min xfer 					= 12316680.00 KB

	Test running:
	Children see throughput for 16 readers 		=  454358.62 KB/sec
	Min throughput per thread 			=   28298.30 KB/sec 
	Max throughput per thread 			=   28592.02 KB/sec
	Avg throughput per thread 			=   28397.41 KB/sec
	Min xfer 					= 12211568.00 KB

	Test running:
	Children see throughput for 16 re-readers 	=  459262.06 KB/sec
	Min throughput per thread 			=   28600.55 KB/sec 
	Max throughput per thread 			=   28892.20 KB/sec
	Avg throughput per thread 			=   28703.88 KB/sec
	Min xfer 					= 12219504.00 KB

	Test cleanup:


iozone test complete.
Running loop 2 of 2
	Iozone: Performance Test of File I/O
	        Version $Revision: 3.326 $
		Compiled for 64 bit mode.
		Build: linux 

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

	Run began: Tue Dec 22 22:53:33 2009

	Network distribution mode enabled.
	File size set to 12338176 KB
	Command line used: /opt/iozone/bin/iozone -+m hosts.556 -s 12049m -S 8192 -T -i 0 -i 1 -t 16 -F /data2/sccomp/BIG.0.comp03.streamline /data2/sccomp/BIG.1.comp03.streamline /data2/sccomp/BIG.2.comp03.streamline /data2/sccomp/BIG.3.comp03.streamline /data2/sccomp/BIG.0.ral02.streamline /data2/sccomp/BIG.1.ral02.streamline /data2/sccomp/BIG.2.ral02.streamline /data2/sccomp/BIG.3.ral02.streamline /data2/sccomp/BIG.0.ral03.streamline /data2/sccomp/BIG.1.ral03.streamline /data2/sccomp/BIG.2.ral03.streamline /data2/sccomp/BIG.3.ral03.streamline /data2/sccomp/BIG.0.ral04.streamline /data2/sccomp/BIG.1.ral04.streamline /data2/sccomp/BIG.2.ral04.streamline /data2/sccomp/BIG.3.ral04.streamline
	Output is in Kbytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 8192 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
	Throughput test with 16 threads
	Each thread writes a 12338176 Kbyte file in 4 Kbyte records

	Test running:
	Children see throughput for 16 initial writers 	=  425851.12 KB/sec
	Min throughput per thread 			=   26593.95 KB/sec 
	Max throughput per thread 			=   26634.84 KB/sec
	Avg throughput per thread 			=   26615.70 KB/sec
	Min xfer 					= 12319368.00 KB

	Test running:
	Children see throughput for 16 rewriters 	=  424954.77 KB/sec
	Min throughput per thread 			=   26459.38 KB/sec 
	Max throughput per thread 			=   26656.61 KB/sec
	Avg throughput per thread 			=   26559.67 KB/sec
	Min xfer 					= 12247176.00 KB

	Test running:
	Children see throughput for 16 readers 		=  459433.33 KB/sec
	Min throughput per thread 			=   28449.77 KB/sec 
	Max throughput per thread 			=   28964.50 KB/sec
	Avg throughput per thread 			=   28714.58 KB/sec
	Min xfer 					= 12119024.00 KB

	Test running:
	Children see throughput for 16 re-readers 	=  458413.46 KB/sec
	Min throughput per thread 			=   28457.53 KB/sec 
	Max throughput per thread 			=   28831.23 KB/sec
	Avg throughput per thread 			=   28650.84 KB/sec
	Min xfer 					= 12178288.00 KB

	Test cleanup:


iozone test complete.
echo
echo ---------------
echo Job output ends
echo =========================================================
echo SGE job: finished   date = Tue Dec 22 23:25:20 GMT 2009
echo Total run time : 1 Hours 3 Minutes 42 Seconds
echo Time in seconds: 3822 Seconds
echo =========================================================

==========================================================
Cluster name         : Delldemo
Arch                 : x86_64 
SGE job submitted    : Tue Dec 22 23:25:30 GMT 2009
Number of CPUS 8 
Running Parallel IOZONE on comp01
Creating files in /data2/sccomp
NTHREADS=8
Total data size = 32240 MBytes
Running loop 1 of 2
	Iozone: Performance Test of File I/O
	        Version $Revision: 3.326 $
		Compiled for 64 bit mode.
		Build: linux 

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

	Run began: Tue Dec 22 23:25:30 2009

	Network distribution mode enabled.
	File size set to 4126720 KB
	Command line used: /opt/iozone/bin/iozone -+m hosts.557 -s 4030m -S 512 -T -i 0 -i 1 -t 32 -F /data2/sccomp/BIG.0.comp00.streamline /data2/sccomp/BIG.1.comp00.streamline /data2/sccomp/BIG.2.comp00.streamline /data2/sccomp/BIG.3.comp00.streamline /data2/sccomp/BIG.4.comp00.streamline /data2/sccomp/BIG.5.comp00.streamline /data2/sccomp/BIG.6.comp00.streamline /data2/sccomp/BIG.7.comp00.streamline /data2/sccomp/BIG.0.comp01.streamline /data2/sccomp/BIG.1.comp01.streamline /data2/sccomp/BIG.2.comp01.streamline /data2/sccomp/BIG.3.comp01.streamline /data2/sccomp/BIG.4.comp01.streamline /data2/sccomp/BIG.5.comp01.streamline /data2/sccomp/BIG.6.comp01.streamline /data2/sccomp/BIG.7.comp01.streamline /data2/sccomp/BIG.0.comp02.streamline /data2/sccomp/BIG.1.comp02.streamline /data2/sccomp/BIG.2.comp02.streamline /data2/sccomp/BIG.3.comp02.streamline /data2/sccomp/BIG.4.comp02.streamline /data2/sccomp/BIG.5.comp02.streamline /data2/sccomp/BIG.6.comp02.streamline /data2/sccomp/BIG.7.comp02.streamline /data2/sccomp/BIG.0.ral01.streamline /data2/sccomp/BIG.1.ral01.streamlineCommand line too long to save completely.

	Output is in Kbytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 512 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
	Throughput test with 32 threads
	Each thread writes a 4126720 Kbyte file in 4 Kbyte records

	Test running:
	Children see throughput for 32 initial writers 	=  431608.71 KB/sec
	Min throughput per thread 			=   13462.86 KB/sec 
	Max throughput per thread 			=   13516.74 KB/sec
	Avg throughput per thread 			=   13487.77 KB/sec
	Min xfer 					= 4110728.00 KB

	Test running:
	Children see throughput for 32 rewriters 	=  433205.56 KB/sec
	Min throughput per thread 			=   13512.67 KB/sec 
	Max throughput per thread 			=   13550.23 KB/sec
	Avg throughput per thread 			=   13537.67 KB/sec
	Min xfer 					= 4116360.00 KB

	Test running:
	Children see throughput for 32 readers 		=  458239.61 KB/sec
	Min throughput per thread 			=   13983.61 KB/sec 
	Max throughput per thread 			=   14699.36 KB/sec
	Avg throughput per thread 			=   14319.99 KB/sec
	Min xfer 					= 3925872.00 KB

	Test running:
	Children see throughput for 32 re-readers 	=  457589.70 KB/sec
	Min throughput per thread 			=   13990.14 KB/sec 
	Max throughput per thread 			=   14654.56 KB/sec
	Avg throughput per thread 			=   14299.68 KB/sec
	Min xfer 					= 3939696.00 KB

	Test cleanup:


iozone test complete.
Running loop 2 of 2
	Iozone: Performance Test of File I/O
	        Version $Revision: 3.326 $
		Compiled for 64 bit mode.
		Build: linux 

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

	Run began: Tue Dec 22 23:48:31 2009

	Network distribution mode enabled.
	File size set to 4126720 KB
	Command line used: /opt/iozone/bin/iozone -+m hosts.557 -s 4030m -S 512 -T -i 0 -i 1 -t 32 -F /data2/sccomp/BIG.0.comp00.streamline /data2/sccomp/BIG.1.comp00.streamline /data2/sccomp/BIG.2.comp00.streamline /data2/sccomp/BIG.3.comp00.streamline /data2/sccomp/BIG.4.comp00.streamline /data2/sccomp/BIG.5.comp00.streamline /data2/sccomp/BIG.6.comp00.streamline /data2/sccomp/BIG.7.comp00.streamline /data2/sccomp/BIG.0.comp01.streamline /data2/sccomp/BIG.1.comp01.streamline /data2/sccomp/BIG.2.comp01.streamline /data2/sccomp/BIG.3.comp01.streamline /data2/sccomp/BIG.4.comp01.streamline /data2/sccomp/BIG.5.comp01.streamline /data2/sccomp/BIG.6.comp01.streamline /data2/sccomp/BIG.7.comp01.streamline /data2/sccomp/BIG.0.comp02.streamline /data2/sccomp/BIG.1.comp02.streamline /data2/sccomp/BIG.2.comp02.streamline /data2/sccomp/BIG.3.comp02.streamline /data2/sccomp/BIG.4.comp02.streamline /data2/sccomp/BIG.5.comp02.streamline /data2/sccomp/BIG.6.comp02.streamline /data2/sccomp/BIG.7.comp02.streamline /data2/sccomp/BIG.0.ral01.streamline /data2/sccomp/BIG.1.ral01.streamlineCommand line too long to save completely.

	Output is in Kbytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 512 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
	Throughput test with 32 threads
	Each thread writes a 4126720 Kbyte file in 4 Kbyte records

	Test running:
	Children see throughput for 32 initial writers 	=  432863.52 KB/sec
	Min throughput per thread 			=   13489.46 KB/sec 
	Max throughput per thread 			=   13564.23 KB/sec
	Avg throughput per thread 			=   13526.99 KB/sec
	Min xfer 					= 4104456.00 KB

	Test running:
	Children see throughput for 32 rewriters 	=  433386.73 KB/sec
	Min throughput per thread 			=   13525.65 KB/sec 
	Max throughput per thread 			=   13553.97 KB/sec
	Avg throughput per thread 			=   13543.34 KB/sec
	Min xfer 					= 4118280.00 KB

	Test running:
	Children see throughput for 32 readers 		=  458043.86 KB/sec
	Min throughput per thread 			=   13969.76 KB/sec 
	Max throughput per thread 			=   14944.34 KB/sec
	Avg throughput per thread 			=   14313.87 KB/sec
	Min xfer 					= 3857648.00 KB

	Test running:
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux