Hello Josh,
thank you very much for your answer. Well we do not use encryption on
our OSDs, but the symptoms you found are quite similar to what I
observed after the upgrade to 14.2.22.
We also use the new default with bluefs_buffered_io=true which probably
cause the higher OSD latencies seen. Since we do not use dmcrypt but
still have higher latencies as well as higher throuput it seems dmcrypt
ist not the only way to see these symptims.
How did you find out that dmcrypt causes this situation in your case?
Thanks you also for the description what bluefs_buffered_io=true
actually does. I googled a bit but did not find any documentation about
this buffer. Do you know where more information can be found?
Thanks a lot
Rainer
Am 16.07.21 um 16:35 schrieb Josh Baergen:
Hi Rainer,
Are you using dmcrypt on your OSDs? I ask because I'm wondering if
you're seeing something similar to what we saw in our systems with
bluefs_buffered_io=true (as it is by default in 14.2.22, whereas it's
false in 14.2.16): With this set, bluefs writes go through Linux's
buffer cache, and large writes actually get chopped up at buffer
boundaries, with the expectation that the I/O scheduler will merge those
writes into larger ones again. However, because the dmcrypt layer is a
bit slower, we found that the individual writes would leak through to
the scheduler at a slow enough pace that they wouldn't all be merged,
resulting in a bunch of smaller writes to the HDDs that were missing
revolutions, increasing OSD commit/apply latency. Like you, we didn't
see a major effect on end user performance for most of our systems, but
the effect was pretty drastic for one of them. Not as big of an issue
for SSDs, of course, because most of them have some sort of cache (SLC
or otherwise) that can absorb these smaller writes and internally commit
as larger ones. (Theoretically a writeback cache in front of HDDs should
help with this as well, but we tend to avoid those.)
bluefs_buffered_io=true helps with some rocksdb iterate-and-delete
workloads in particular, such as snaptrim. PG removal was optimized in
14.2.17 to avoid the performance issues that bluefs_buffered_io is
needed to solve. We run HDDs in some of our RGW clusters only and so we
chose to set bluefs_buffered_io=false for all of our HDD nodes to avoid
the latency hit, since PG removal is the only workload of this type in a
typical RGW system.
Josh
On Fri, Jul 16, 2021 at 6:58 AM Rainer Krienke <krienke@xxxxxxxxxxxxxx
<mailto:krienke@xxxxxxxxxxxxxx>> wrote:
Hello,
Today I upgraded a ceph (HDD) cluster consisting of 9 hosts with
each 16
OSDs (a total of 144) to the latest Nautilus version 14.2.22. The
upgrade proceeded without problems. The cluster is healthy. After all
hosts were on 14.2.22 I saw in grafana that OSD latencies were by
85msec
after an hour they dropped to about 45 ms. And now probably because
the
cluster faces a little higher IO demand from the Proxmox client side
the
OSD latencies are again at 57ms.
Before the upgrade running 14.2.16 this value was about 33msec.
I looked at ceph os perf where I can see an always changing set of OSDS
that have latencies of about 300, right after the upgrade up some
had up
to 800 ms. Now there are always say 20 OSD that are between 100 and
400msec. They are not all from one host and this high latency osd set
has members that stay longer in this high state and others that change
more often to a lower value again:
# ceph osd perf|sort -n -k 2|tail -30
134 37 37
19 38 38
112 39 39
12 42 42
75 42 42
67 43 43
51 45 45
81 45 45
92 50 50
40 56 56
63 60 60
59 61 61
128 65 65
135 65 65
124 66 66
117 94 94
35 94 94
26 112 112
14 127 127
56 135 135
100 164 164
83 168 168
62 177 177
82 182 182
30 186 186
72 186 186
102 203 203
131 211 211
121 247 247
46 254 254
137 340 340
On the other hand if I try to test performance on a linux VM running on
proxmox that uses this cluster as a storage backend I do not have the
feeling that its slower than before, when I test eg IO Performance
using
bonnie++ . It actually seems to be faster. But why then the higher osd
latencies?
Does anyone have an idea why those latencies could have nearly doubled?
How can I find out more about this strangeness? Any Ideas?
Thanks
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke
<http://www.uni-koblenz.de/~krienke>, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html
<http://www.uni-koblenz.de/~krienke/mypgp.html>, Fax: +49261287
1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287
1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx