High OSD latencies afer Upgrade 14.2.16 -> 14.2.22

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Today I upgraded a ceph (HDD) cluster consisting of 9 hosts with each 16 OSDs (a total of 144) to the latest Nautilus version 14.2.22. The upgrade proceeded without problems. The cluster is healthy. After all hosts were on 14.2.22 I saw in grafana that OSD latencies were by 85msec after an hour they dropped to about 45 ms. And now probably because the cluster faces a little higher IO demand from the Proxmox client side the OSD latencies are again at 57ms.

Before the upgrade running 14.2.16 this value was about 33msec.

I looked at ceph os perf where I can see an always changing set of OSDS that have latencies of about 300, right after the upgrade up some had up to 800 ms. Now there are always say 20 OSD that are between 100 and 400msec. They are not all from one host and this high latency osd set has members that stay longer in this high state and others that change more often to a lower value again:

# ceph osd perf|sort -n -k 2|tail -30
134                 37                37
 19                 38                38
112                 39                39
 12                 42                42
 75                 42                42
 67                 43                43
 51                 45                45
 81                 45                45
 92                 50                50
 40                 56                56
 63                 60                60
 59                 61                61
128                 65                65
135                 65                65
124                 66                66
117                 94                94
 35                 94                94
 26                112               112
 14                127               127
 56                135               135
100                164               164
 83                168               168
 62                177               177
 82                182               182
 30                186               186
 72                186               186
102                203               203
131                211               211
121                247               247
 46                254               254
137                340               340

On the other hand if I try to test performance on a linux VM running on proxmox that uses this cluster as a storage backend I do not have the feeling that its slower than before, when I test eg IO Performance using bonnie++ . It actually seems to be faster. But why then the higher osd latencies?

Does anyone have an idea why those latencies could have nearly doubled? How can I find out more about this strangeness? Any Ideas?

Thanks
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux