Concurrency limitation?

B.Candler at pobox.com (Brian Candler) · Tue, 7 Feb 2012 14:09:33 +0000

I appear to be hitting a limitation in either the glusterfs FUSE client or
the glusterfsd daemon, and I wonder if there are some knobs I can tweak.

I have a 12-disk RAID10 array. If I access it locally I get the following
figures (#p = number of concurrent reader processes)

 #p  files/sec
  1      35.52
  2      66.13
  5     137.73
 10     215.51
 20     291.45
 30     337.01

If I access it as a single-brick distributed glusterfs volume over 10GE I
get the following figures:

 #p  files/sec
  1      39.09
  2      70.44
  5     135.79
 10     157.48
 20     179.75
 30     206.34

The performance tracks very closely the raw RAID10 performance at 1, 2 and 5
concurrent readers.  However at 10+ concurrent readers it falls well below
what the RAID10 volume is capable of.

These files are an average of 650K each, so 200 files/sec = 134MB/s, a
little over 10% of the 10GE bandwidth.

I am guessing either there is a limit on the number of concurrent
operations on the same filesystem/brick, or some sort of window limit I've
hit.

I have tried:
    gluster volume set raid10 performance.io-thread-count 64
and restarted glusterd and remounted the filesystem, but it didn't seem
to make any difference.

I have also tried two separate client machines each trying 15 concurrent
connections, but the aggregate throughput is no more than 30 processes on a
single client. This suggests to me that glusterfsd (the brick) is the
bottleneck.  If I attach strace to this process it tells me:

Process 1835 attached with 11 threads - interrupt to quit

Can I increase that number of threads? Is there anything else I can try?

Regards,

Brian.

Test Methodology:

I am using the measurement script below, either pointing it at data/raid10
/on the server (the raw brick) or /mnt/raid10 on the client. The corpus of
100K files between 500KB and 800KB was created using

    bonnie++ -d /mnd/raid10 -n 98:800k:500k:1000:1024k -s 0 -u root

and then killing it after the file creation phase.

------- 8< --------------------------------------------------------------
#!/usr/bin/ruby -w

FILEGROUPS = {
  "sdb" => "/data/sdb/Bonnie.26384/*/*",
  "sdc" => "/data/sdc/Bonnie.26384/*/*",
  "sdd" => "/data/sdd/Bonnie.26384/*/*",
  "sde" => "/data/sde/Bonnie.26384/*/*",
  "replic" => "/mnt/replic/Bonnie.3385/*/*",
  "raid10-direct" => "/data/raid10/Bonnie.5021/*/*",
  "raid10-gluster" => "/mnt/raid10/Bonnie.5021/*/*",
}

class Perftest
  attr_accessor :offset

  def initialize(filenames)
    @offset = 0
    @filenames = filenames
    @pids = []
  end

  def run(n_files, n_procs=1, dd_args="", random=false)
    system("echo 3 >/proc/sys/vm/drop_caches")
    if random
      files = @filenames.sort_by { rand }[0, n_files] 
    else
      files = (@filenames + @filenames)[@offset, n_files]
      @offset = (offset + n_files) % @filenames.size
    end
    chunks = files.each_slice(n_files/n_procs).to_a[0, n_procs]
    n_files = chunks.map { |chunk| chunk.size }.inject(:+)
    timed(n_files, n_procs, "#{dd_args} #{"[random]" if random}") do
      @pids = chunks.map { |chunk| fork { run_single(chunk, dd_args); exit! } }
      @pids.delete_if { |pid| Process.waitpid(pid) }
    end
  end

  def timed(n_files, n_procs=1, args="")
    t1 = Time.now
    yield
    t2 = Time.now
    printf "%3d %10.2f  %s\n", n_procs, n_files/(t2-t1), args
  end

  def run_single(files, dd_args)
    files.each do |f|
      system("dd if='#{f}' of=/dev/null #{dd_args} 2>/dev/null")
    end
  end

  def kill_all(sig="TERM")
    @pids.each { |pid| Process.kill(sig, pid) rescue nil }
  end
end

label = ARGV[0]
unless glob = FILEGROUPS[label]
  STDERR.puts "Usage: #{$0} <filegroup>"
  exit 1
end
perftest = Perftest.new(Dir[glob].freeze)

# Remember the offset for sequential tests, so that re-runs don't use
# cached data at the server. Still better to drop vm caches at the server.
memo = "/var/tmp/perftest.offset"
perftest.offset = File.read(memo).to_i rescue 0
at_exit do
  perftest.kill_all
  File.open(memo,"w") { |f| f.puts perftest.offset }
end

puts " #p  files/sec  dd_args"
[1,2,5].each do |nprocs|
  perftest.run(10000, nprocs, "bs=1024k")
  perftest.run(4000, nprocs, "bs=1024k",1)
end
[10,20,30].each do |nprocs|
  perftest.run(10000, nprocs, "bs=1024k")
  perftest.run(10000, nprocs, "bs=1024k",1)
end