Hi Mark,
Sorry for the late reply... I have been away on vacation/openstack summit etc for over a month now and looking at this again.
Yeah the snippet was a bit misleading. The fio file contains small block jobs as well as big block jobs:
[write-rbd1-4m-depth1]
rbdname=rbd-tester-fio
bs=4m
iodepth=1
rw=write
stonewall
[write-rbd2-4m-depth16]
rbdname=rbd-tester-fio-2
bs=4m
iodepth=16
rw=write
stonewall
[read-rbd1-4m-depth1]
rbdname=rbd-tester-fio
bs=4m
iodepth=1
rw=read
stonewall
[read-rbd2-4m-depth16]
rbdname=rbd-tester-fio-2
bs=4m
iodepth=16
rw=read
stonewall
The performance hit is more noticeable on bigblock, I think up to 10x slower on some runs but as a percentage it seems to affect a small block workload too. I understand that runs will vary... I wish I had more runs from before upgrading to luminous but I only have that single set of results. Regardless, I cannot come close to that single set of results since upgrading to luminous.
I understand the caching stuff you mentioned, however we have not changed any of that config since the upgrade and the fio job is exactly the same. So if I do many runs on luminous throughout the course of a day, including when we think the cluster is least busy, we should be able to come pretty close to the jewel result on at least one of the runs or is my thinking flawed?
Sage mentioned at openstack that there was a perf regression with librbd which will be fixed in 12.2.2.... are you aware of this? If so can you send me the link to the bug?
Cheers,
Raf
On 22 September 2017 at 00:31, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
Hi Rafael,
In the original email you mentioned 4M block size, seq read, but here it looks like you are doing 4k writes? Can you clarify? If you are doing 4k direct sequential writes with iodepth=1 and are also using librbd cache, please make sure that librbd is set to writeback mode in both cases. RBD by default will not kick into WB mode until it sees a flush request, and the librbd engine in fio doesn't issue one before a test is started. It can be pretty easy to end up in a situation where writeback cache is active on some tests but not others if you aren't careful. IE If one of your tests was done after a flush and the other was not, you'd likely see a dramatic difference in performance during this test.
You can avoid this by telling librbd to always use WB mode (at least when benchmarking):
rbd cache writethrough until flush = false
Mark
On 09/20/2017 01:51 AM, Rafael Lopez wrote:
<mailto:aderumier@xxxxxxxxx>> wrote:Hi Alexandre,
Yeah we are using filestore for the moment with luminous. With regards
to client, I tried both jewel and luminous librbd versions against the
luminous cluster - similar results.
I am running fio on a physical machine with fio rbd engine. This is a
snippet of the fio config for the runs (the complete jobfile adds
variations of read/write/block size/iodepth).
[global]
ioengine=rbd
clientname=cinder-volume
pool=rbd-bronze
invalidate=1
ramp_time=5
runtime=30
time_based
direct=1
[write-rbd1-4k-depth1]
rbdname=rbd-tester-fio
bs=4k
iodepth=1
rw=write
stonewall
[write-rbd2-4k-depth16]
rbdname=rbd-tester-fio-2
bs=4k
iodepth=16
rw=write
stonewall
Raf
On 20 September 2017 at 16:43, Alexandre DERUMIER <aderumier@xxxxxxxxx
Hi
so, you use also filestore on luminous ?
do you have also upgraded librbd on client ? (are you benching
inside a qemu machine ? or directly with fio-rbd ?)
(I'm going to do a lot of benchmarks in coming week, I'll post
results on mailing soon.)
----- Mail original -----
De: "Rafael Lopez" <rafael.lopez@xxxxxxxxxx
<mailto:rafael.lopez@xxxxxxxxxu >>
À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx
<mailto:ceph-users@xxxxxxxxxx.com >>ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.
Envoyé: Mercredi 20 Septembre 2017 08:17:23
Objet: luminous vs jewel rbd performance
hey guys.
wondering if anyone else has done some solid benchmarking of jewel
vs luminous, in particular on the same cluster that has been
upgraded (same cluster, client and config).
we have recently upgraded a cluster from 10.2.9 to 12.2.0, and
unfortunately i only captured results from a single fio (librbd) run
with a few jobs in it before upgrading. i have run the same fio
jobfile many times at different times of the day since upgrading,
and been unable to produce a close match to the pre-upgrade (jewel)
run from the same client. one particular job is significantly slower
(4M block size, iodepth=1, seq read), up to 10x in one run.
i realise i havent supplied much detail and it could be dozens of
things, but i just wanted to see if anyone else had done more
quantitative benchmarking or had similar experiences. keep in mind
all we changed was daemons were restarted to use luminous code,
everything else exactly the same. granted it is possible that
some/all osds had some runtime config injected that differs from
now, but i'm fairly confident this is not the case as they were
recently restarted (on jewel code) after OS upgrades.
cheers,
Raf
_______________________________________________
ceph-users mailing listcom >
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
--
*Rafael Lopez*
Research Devops Engineer
Monash University eResearch Centre
T: +61 3 9905 9118 <tel:%2B61%203%209905%209118>
M: +61 (0)427682670 <tel:%2B61%204%2027682%20670>
E: rafael.lopez@xxxxxxxxxx <mailto:rafael.lopez@xxxxxxxxxu >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Rafael Lopez
Research Devops Engineer
T: +61 3 9905 9118
M: +61 (0)427682670
E: rafael.lopez@xxxxxxxxxx
Monash University eResearch Centre
M: +61 (0)427682670
E: rafael.lopez@xxxxxxxxxx
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com