I appear to be hitting a limitation in either the glusterfs FUSE client or the glusterfsd daemon, and I wonder if there are some knobs I can tweak. I have a 12-disk RAID10 array. If I access it locally I get the following figures (#p = number of concurrent reader processes) #p files/sec 1 35.52 2 66.13 5 137.73 10 215.51 20 291.45 30 337.01 If I access it as a single-brick distributed glusterfs volume over 10GE I get the following figures: #p files/sec 1 39.09 2 70.44 5 135.79 10 157.48 20 179.75 30 206.34 The performance tracks very closely the raw RAID10 performance at 1, 2 and 5 concurrent readers. However at 10+ concurrent readers it falls well below what the RAID10 volume is capable of. These files are an average of 650K each, so 200 files/sec = 134MB/s, a little over 10% of the 10GE bandwidth. I am guessing either there is a limit on the number of concurrent operations on the same filesystem/brick, or some sort of window limit I've hit. I have tried: gluster volume set raid10 performance.io-thread-count 64 and restarted glusterd and remounted the filesystem, but it didn't seem to make any difference. I have also tried two separate client machines each trying 15 concurrent connections, but the aggregate throughput is no more than 30 processes on a single client. This suggests to me that glusterfsd (the brick) is the bottleneck. If I attach strace to this process it tells me: Process 1835 attached with 11 threads - interrupt to quit Can I increase that number of threads? Is there anything else I can try? Regards, Brian. Test Methodology: I am using the measurement script below, either pointing it at data/raid10 /on the server (the raw brick) or /mnt/raid10 on the client. The corpus of 100K files between 500KB and 800KB was created using bonnie++ -d /mnd/raid10 -n 98:800k:500k:1000:1024k -s 0 -u root and then killing it after the file creation phase. ------- 8< -------------------------------------------------------------- #!/usr/bin/ruby -w FILEGROUPS = { "sdb" => "/data/sdb/Bonnie.26384/*/*", "sdc" => "/data/sdc/Bonnie.26384/*/*", "sdd" => "/data/sdd/Bonnie.26384/*/*", "sde" => "/data/sde/Bonnie.26384/*/*", "replic" => "/mnt/replic/Bonnie.3385/*/*", "raid10-direct" => "/data/raid10/Bonnie.5021/*/*", "raid10-gluster" => "/mnt/raid10/Bonnie.5021/*/*", } class Perftest attr_accessor :offset def initialize(filenames) @offset = 0 @filenames = filenames @pids = [] end def run(n_files, n_procs=1, dd_args="", random=false) system("echo 3 >/proc/sys/vm/drop_caches") if random files = @filenames.sort_by { rand }[0, n_files] else files = (@filenames + @filenames)[@offset, n_files] @offset = (offset + n_files) % @filenames.size end chunks = files.each_slice(n_files/n_procs).to_a[0, n_procs] n_files = chunks.map { |chunk| chunk.size }.inject(:+) timed(n_files, n_procs, "#{dd_args} #{"[random]" if random}") do @pids = chunks.map { |chunk| fork { run_single(chunk, dd_args); exit! } } @pids.delete_if { |pid| Process.waitpid(pid) } end end def timed(n_files, n_procs=1, args="") t1 = Time.now yield t2 = Time.now printf "%3d %10.2f %s\n", n_procs, n_files/(t2-t1), args end def run_single(files, dd_args) files.each do |f| system("dd if='#{f}' of=/dev/null #{dd_args} 2>/dev/null") end end def kill_all(sig="TERM") @pids.each { |pid| Process.kill(sig, pid) rescue nil } end end label = ARGV[0] unless glob = FILEGROUPS[label] STDERR.puts "Usage: #{$0} <filegroup>" exit 1 end perftest = Perftest.new(Dir[glob].freeze) # Remember the offset for sequential tests, so that re-runs don't use # cached data at the server. Still better to drop vm caches at the server. memo = "/var/tmp/perftest.offset" perftest.offset = File.read(memo).to_i rescue 0 at_exit do perftest.kill_all File.open(memo,"w") { |f| f.puts perftest.offset } end puts " #p files/sec dd_args" [1,2,5].each do |nprocs| perftest.run(10000, nprocs, "bs=1024k") perftest.run(4000, nprocs, "bs=1024k",1) end [10,20,30].each do |nprocs| perftest.run(10000, nprocs, "bs=1024k") perftest.run(10000, nprocs, "bs=1024k",1) end