On Tue, Mar 20, 2018 at 9:45 AM, Sam McLeod <mailinglists@xxxxxxxxxxx> wrote:
Excellent description, thank you.With performance.write-behind-trickling-writes ON (default): ## 4k randwrite
# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwritetest: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32fio-3.1Starting 1 processJobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=17.3MiB/s][r=0,w=4422 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=42701: Tue Mar 20 15:05:23 2018write: IOPS=4443, BW=17.4MiB/s (18.2MB/s)(256MiB/14748msec)bw ( KiB/s): min=16384, max=19184, per=99.92%, avg=17760.45, stdev=602.48, samples=29iops : min= 4096, max= 4796, avg=4440.07, stdev=150.66, samples=29cpu : usr=4.00%, sys=18.02%, ctx=131097, majf=0, minf=7IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=32Run status group 0 (all jobs):WRITE: bw=17.4MiB/s (18.2MB/s), 17.4MiB/s-17.4MiB/s (18.2MB/s-18.2MB/s), io=256MiB (268MB), run=14748-14748msec## 2k randwrite# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwritetest: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=libaio, iodepth=32fio-3.1Starting 1 processJobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8624KiB/s][r=0,w=4312 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=42781: Tue Mar 20 15:05:57 2018write: IOPS=4439, BW=8880KiB/s (9093kB/s)(256MiB/29522msec)bw ( KiB/s): min= 6908, max= 9564, per=99.94%, avg=8874.03, stdev=428.92, samples=59iops : min= 3454, max= 4782, avg=4437.00, stdev=214.44, samples=59cpu : usr=2.43%, sys=18.18%, ctx=262222, majf=0, minf=8IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=32Run status group 0 (all jobs):WRITE: bw=8880KiB/s (9093kB/s), 8880KiB/s-8880KiB/s (9093kB/s-9093kB/s), io=256MiB (268MB), run=29522-29522msecWith performance.write-behind-trickling-writes OFF: ## 4k randwrite - just over half the IOP/s of having it ON.
Note that since the workload is random write, no aggregation is possible. So, there is no point in waiting for future writes and turning trickling-writes on makes sense.
A better test to measure the impact of this option would be sequential write workload. I guess smaller the writes, more pronounced one would see the benefits of this option turned off.
--# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwritetest: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32fio-3.1Starting 1 processJobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=44225: Tue Mar 20 15:11:04 2018write: IOPS=2594, BW=10.1MiB/s (10.6MB/s)(256MiB/25259msec)bw ( KiB/s): min= 2248, max=18728, per=100.00%, avg=10454.10, stdev=6481.14, samples=50iops : min= 562, max= 4682, avg=2613.50, stdev=1620.35, samples=50cpu : usr=2.29%, sys=10.09%, ctx=131141, majf=0, minf=7IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=32Run status group 0 (all jobs):WRITE: bw=10.1MiB/s (10.6MB/s), 10.1MiB/s-10.1MiB/s (10.6MB/s-10.6MB/s), io=256MiB (268MB), run=25259-25259msec## 2k randwrite - no noticable change.# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwritetest: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=libaio, iodepth=32fio-3.1Starting 1 processJobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8662KiB/s][r=0,w=4331 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=45813: Tue Mar 20 15:12:02 2018write: IOPS=4291, BW=8583KiB/s (8789kB/s)(256MiB/30541msec)bw ( KiB/s): min= 7416, max=10264, per=99.94%, avg=8577.66, stdev=618.31, samples=61iops : min= 3708, max= 5132, avg=4288.84, stdev=309.15, samples=61cpu : usr=2.87%, sys=15.83%, ctx=262236, majf=0, minf=8IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=32Run status group 0 (all jobs):WRITE: bw=8583KiB/s (8789kB/s), 8583KiB/s-8583KiB/s (8789kB/s-8789kB/s), io=256MiB (268MB), run=30541-30541msecLet me know if you'd recommend any other benchmarks comparing performance.write-behind-trickling-writes ON/OFF (just nothing that'll seriously risk locking up the whole gluster cluster please!).
Sam McLeod
Please respond via email when possible.
https://smcleod.net
https://twitter.com/s_mcleodOn 20 Mar 2018, at 2:56 pm, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:On Tue, Mar 20, 2018 at 8:57 AM, Sam McLeod <mailinglists@xxxxxxxxxxx> wrote:Hi Raghavendra,On 20 Mar 2018, at 1:55 pm, Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:Aggregating large number of small writes by write-behind into large writes has been merged on master:Would like to know whether it helps for this usecase. Note that its not part of any release yet. So you've to build and install from repo.Sounds interesting, not too keen to build packages at the moment but I've added myself as a watcher to that issue on Github and once it's in a 3.x release I'll try it and let you know.Another suggestion is to run tests with turning off option performance.write-behind-trickling-writes. # gluster volume set <volname> performance.write-behind-trickling-writes off A word of caution though is if your files are too small, these suggestions may not have much impact.I'm looking for documentation on this option but all I could really find is in the source for write-behind.c:if is enabled (which it is), do not hold back writes if there are no outstanding requests.Till recently this functionality though was available, couldn't be configured from cli. One could change this option by editing volume configuration file. However, now its configurable through cli:and a note on aggregate-size stating that"aggregation won't happen if performance.write-behind-trickling-writes is turned on" What are the potentially negative performance impacts of disabling this?Even if aggregation option is turned off, write-behind has the capacity to aggregate till a size of 128KB. But, to completely make use of this in case of small write workloads write-behind has to wait for sometime so that there are enough number of write-requests to fill the capacity. With this option enabled, write-behind though aggregates existing requests, won't wait for future writes. This means descendant xlators of write-behind can see writes smaller than 128K. So, for a scenario where small number of large writes are preferred over large number of small sized writes, this can be a problem.--
Sam McLeod (protoporpoise on IRC)https://smcleod.net
https://twitter.com/s_mcleod
Words are my own opinions and do not necessarily represent those of my employer or partners.
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users