Fuse has a limitation where in requests become single threaded at the server layer. I had opened a bug but have not got any traction so far. Adding more concurrent connections starts to do lot of context switching at FUSE layer. This is an inherent limitation. On Tue, Feb 7, 2012 at 6:09 AM, Brian Candler <B.Candler at pobox.com> wrote: > I appear to be hitting a limitation in either the glusterfs FUSE client or > the glusterfsd daemon, and I wonder if there are some knobs I can tweak. > > I have a 12-disk RAID10 array. If I access it locally I get the following > figures (#p = number of concurrent reader processes) > > ?#p ?files/sec > ?1 ? ? ?35.52 > ?2 ? ? ?66.13 > ?5 ? ? 137.73 > ?10 ? ? 215.51 > ?20 ? ? 291.45 > ?30 ? ? 337.01 > > If I access it as a single-brick distributed glusterfs volume over 10GE I > get the following figures: > > ?#p ?files/sec > ?1 ? ? ?39.09 > ?2 ? ? ?70.44 > ?5 ? ? 135.79 > ?10 ? ? 157.48 > ?20 ? ? 179.75 > ?30 ? ? 206.34 > > The performance tracks very closely the raw RAID10 performance at 1, 2 and 5 > concurrent readers. ?However at 10+ concurrent readers it falls well below > what the RAID10 volume is capable of. > > These files are an average of 650K each, so 200 files/sec = 134MB/s, a > little over 10% of the 10GE bandwidth. > > I am guessing either there is a limit on the number of concurrent > operations on the same filesystem/brick, or some sort of window limit I've > hit. > > I have tried: > ? ?gluster volume set raid10 performance.io-thread-count 64 > and restarted glusterd and remounted the filesystem, but it didn't seem > to make any difference. > > I have also tried two separate client machines each trying 15 concurrent > connections, but the aggregate throughput is no more than 30 processes on a > single client. This suggests to me that glusterfsd (the brick) is the > bottleneck. ?If I attach strace to this process it tells me: > > Process 1835 attached with 11 threads - interrupt to quit > > Can I increase that number of threads? Is there anything else I can try? > > Regards, > > Brian. > > Test Methodology: > > I am using the measurement script below, either pointing it at data/raid10 > /on the server (the raw brick) or /mnt/raid10 on the client. The corpus of > 100K files between 500KB and 800KB was created using > > ? ?bonnie++ -d /mnd/raid10 -n 98:800k:500k:1000:1024k -s 0 -u root > > and then killing it after the file creation phase. > > ------- 8< -------------------------------------------------------------- > #!/usr/bin/ruby -w > > FILEGROUPS = { > ?"sdb" => "/data/sdb/Bonnie.26384/*/*", > ?"sdc" => "/data/sdc/Bonnie.26384/*/*", > ?"sdd" => "/data/sdd/Bonnie.26384/*/*", > ?"sde" => "/data/sde/Bonnie.26384/*/*", > ?"replic" => "/mnt/replic/Bonnie.3385/*/*", > ?"raid10-direct" => "/data/raid10/Bonnie.5021/*/*", > ?"raid10-gluster" => "/mnt/raid10/Bonnie.5021/*/*", > } > > class Perftest > ?attr_accessor :offset > > ?def initialize(filenames) > ? ?@offset = 0 > ? ?@filenames = filenames > ? ?@pids = [] > ?end > > ?def run(n_files, n_procs=1, dd_args="", random=false) > ? ?system("echo 3 >/proc/sys/vm/drop_caches") > ? ?if random > ? ? ?files = @filenames.sort_by { rand }[0, n_files] > ? ?else > ? ? ?files = (@filenames + @filenames)[@offset, n_files] > ? ? ?@offset = (offset + n_files) % @filenames.size > ? ?end > ? ?chunks = files.each_slice(n_files/n_procs).to_a[0, n_procs] > ? ?n_files = chunks.map { |chunk| chunk.size }.inject(:+) > ? ?timed(n_files, n_procs, "#{dd_args} #{"[random]" if random}") do > ? ? ?@pids = chunks.map { |chunk| fork { run_single(chunk, dd_args); exit! } } > ? ? ?@pids.delete_if { |pid| Process.waitpid(pid) } > ? ?end > ?end > > ?def timed(n_files, n_procs=1, args="") > ? ?t1 = Time.now > ? ?yield > ? ?t2 = Time.now > ? ?printf "%3d %10.2f ?%s\n", n_procs, n_files/(t2-t1), args > ?end > > ?def run_single(files, dd_args) > ? ?files.each do |f| > ? ? ?system("dd if='#{f}' of=/dev/null #{dd_args} 2>/dev/null") > ? ?end > ?end > > ?def kill_all(sig="TERM") > ? ?@pids.each { |pid| Process.kill(sig, pid) rescue nil } > ?end > end > > label = ARGV[0] > unless glob = FILEGROUPS[label] > ?STDERR.puts "Usage: #{$0} <filegroup>" > ?exit 1 > end > perftest = Perftest.new(Dir[glob].freeze) > > # Remember the offset for sequential tests, so that re-runs don't use > # cached data at the server. Still better to drop vm caches at the server. > memo = "/var/tmp/perftest.offset" > perftest.offset = File.read(memo).to_i rescue 0 > at_exit do > ?perftest.kill_all > ?File.open(memo,"w") { |f| f.puts perftest.offset } > end > > puts " #p ?files/sec ?dd_args" > [1,2,5].each do |nprocs| > ?perftest.run(10000, nprocs, "bs=1024k") > ?perftest.run(4000, nprocs, "bs=1024k",1) > end > [10,20,30].each do |nprocs| > ?perftest.run(10000, nprocs, "bs=1024k") > ?perftest.run(10000, nprocs, "bs=1024k",1) > end > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users