Hello,
Today I upgraded a ceph (HDD) cluster consisting of 9 hosts with each 16
OSDs (a total of 144) to the latest Nautilus version 14.2.22. The
upgrade proceeded without problems. The cluster is healthy. After all
hosts were on 14.2.22 I saw in grafana that OSD latencies were by 85msec
after an hour they dropped to about 45 ms. And now probably because the
cluster faces a little higher IO demand from the Proxmox client side the
OSD latencies are again at 57ms.
Before the upgrade running 14.2.16 this value was about 33msec.
I looked at ceph os perf where I can see an always changing set of OSDS
that have latencies of about 300, right after the upgrade up some had up
to 800 ms. Now there are always say 20 OSD that are between 100 and
400msec. They are not all from one host and this high latency osd set
has members that stay longer in this high state and others that change
more often to a lower value again:
# ceph osd perf|sort -n -k 2|tail -30
134 37 37
19 38 38
112 39 39
12 42 42
75 42 42
67 43 43
51 45 45
81 45 45
92 50 50
40 56 56
63 60 60
59 61 61
128 65 65
135 65 65
124 66 66
117 94 94
35 94 94
26 112 112
14 127 127
56 135 135
100 164 164
83 168 168
62 177 177
82 182 182
30 186 186
72 186 186
102 203 203
131 211 211
121 247 247
46 254 254
137 340 340
On the other hand if I try to test performance on a linux VM running on
proxmox that uses this cluster as a storage backend I do not have the
feeling that its slower than before, when I test eg IO Performance using
bonnie++ . It actually seems to be faster. But why then the higher osd
latencies?
Does anyone have an idea why those latencies could have nearly doubled?
How can I find out more about this strangeness? Any Ideas?
Thanks
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287
1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx