Re: Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Excellent description, thank you.

With performance.write-behind-trickling-writes ON (default):

## 4k randwrite

# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=17.3MiB/s][r=0,w=4422 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=42701: Tue Mar 20 15:05:23 2018
  write: IOPS=4443, BW=17.4MiB/s (18.2MB/s)(256MiB/14748msec)
   bw (  KiB/s): min=16384, max=19184, per=99.92%, avg=17760.45, stdev=602.48, samples=29
   iops        : min= 4096, max= 4796, avg=4440.07, stdev=150.66, samples=29
  cpu          : usr=4.00%, sys=18.02%, ctx=131097, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=17.4MiB/s (18.2MB/s), 17.4MiB/s-17.4MiB/s (18.2MB/s-18.2MB/s), io=256MiB (268MB), run=14748-14748msec


## 2k randwrite

# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8624KiB/s][r=0,w=4312 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=42781: Tue Mar 20 15:05:57 2018
  write: IOPS=4439, BW=8880KiB/s (9093kB/s)(256MiB/29522msec)
   bw (  KiB/s): min= 6908, max= 9564, per=99.94%, avg=8874.03, stdev=428.92, samples=59
   iops        : min= 3454, max= 4782, avg=4437.00, stdev=214.44, samples=59
  cpu          : usr=2.43%, sys=18.18%, ctx=262222, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=8880KiB/s (9093kB/s), 8880KiB/s-8880KiB/s (9093kB/s-9093kB/s), io=256MiB (268MB), run=29522-29522msec


With performance.write-behind-trickling-writes OFF:

## 4k randwrite - just over half the IOP/s of having it ON.


# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=44225: Tue Mar 20 15:11:04 2018
  write: IOPS=2594, BW=10.1MiB/s (10.6MB/s)(256MiB/25259msec)
   bw (  KiB/s): min= 2248, max=18728, per=100.00%, avg=10454.10, stdev=6481.14, samples=50
   iops        : min=  562, max= 4682, avg=2613.50, stdev=1620.35, samples=50
  cpu          : usr=2.29%, sys=10.09%, ctx=131141, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=10.1MiB/s (10.6MB/s), 10.1MiB/s-10.1MiB/s (10.6MB/s-10.6MB/s), io=256MiB (268MB), run=25259-25259msec


## 2k randwrite - no noticable change.

# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8662KiB/s][r=0,w=4331 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=45813: Tue Mar 20 15:12:02 2018
  write: IOPS=4291, BW=8583KiB/s (8789kB/s)(256MiB/30541msec)
   bw (  KiB/s): min= 7416, max=10264, per=99.94%, avg=8577.66, stdev=618.31, samples=61
   iops        : min= 3708, max= 5132, avg=4288.84, stdev=309.15, samples=61
  cpu          : usr=2.87%, sys=15.83%, ctx=262236, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=8583KiB/s (8789kB/s), 8583KiB/s-8583KiB/s (8789kB/s-8789kB/s), io=256MiB (268MB), run=30541-30541msec


Let me know if you'd recommend any other benchmarks comparing performance.write-behind-trickling-writes ON/OFF (just nothing that'll seriously risk locking up the whole gluster cluster please!).


--
Sam McLeod
Please respond via email when possible.
https://smcleod.net
https://twitter.com/s_mcleod

On 20 Mar 2018, at 2:56 pm, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:



On Tue, Mar 20, 2018 at 8:57 AM, Sam McLeod <mailinglists@xxxxxxxxxxx> wrote:
Hi Raghavendra,


On 20 Mar 2018, at 1:55 pm, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:

Aggregating large number of small writes by write-behind into large writes has been merged on master:

Would like to know whether it helps for this usecase. Note that its not part of any release yet. So you've to build and install from repo.

Sounds interesting, not too keen to build packages at the moment but I've added myself as a watcher to that issue on Github and once it's in a 3.x release I'll try it and let you know.

Another suggestion is to run tests with turning off option performance.write-behind-trickling-writes.

# gluster volume set <volname> performance.write-behind-trickling-writes off

A word of caution though is if your files are too small, these suggestions may not have much impact.

I'm looking for documentation on this option but all I could really find is in the source for write-behind.c:

if is enabled (which it is), do not hold back writes if there are no outstanding requests.

Till recently this functionality though was available, couldn't be configured from cli. One could change this option by editing volume configuration file. However, now its configurable through cli:




and a note on aggregate-size stating that 

"aggregation won't happen if performance.write-behind-trickling-writes is turned on"


What are the potentially negative performance impacts of disabling this?

Even if aggregation option is turned off, write-behind has the capacity to aggregate till a size of 128KB. But, to completely make use of this in case of small write workloads write-behind has to wait for sometime so that there are enough number of write-requests to fill the capacity. With this option enabled, write-behind though aggregates existing requests, won't wait for future writes. This means descendant xlators of write-behind can see writes smaller than 128K. So, for a scenario where small number of large writes are preferred over large number of small sized writes, this can be a problem.


--
Sam McLeod (protoporpoise on IRC)
https://smcleod.net
https://twitter.com/s_mcleod

Words are my own opinions and do not necessarily represent those of my employer or partners.



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux