Hey Erwin, I'd recommend to checkout the individual OSD performance in the slower cluster. We have seen such issues with SSDs that wore off - it might just be a specific OSD / pg that you are hitting. Best regards, Nico Erwin Ceph <ceph@xxxxxxxxxxxxxxxxx> writes: > Hi, > > We do run several Ceph clusters, but one has a strange problem. > > It is running Octopus 15.2.14 on 9 (HP 360 Gen 8, 64 GB, 10 Gbps) servers, 48 OSDs (all 2 TB Samsung SSDs with Bluestore). Monitoring in Grafana shows these three latency values > over 7 days: > > ceph_osd_op_r_latency_sum: avg 1.16 ms, max 9.95 ms > ceph_osd_op_w_latency_sum: avg 5.85 ms, max 26.2 ms > ceph_osd_op_rw_latency_sum: avf 110 ms, max 388 ms > > Average throughput is around 30 MB/sec read and 40 MB/sec write. Both with 2000 iops. > > On another cluster (hardware almost the same, identical software versions), but 25% lower load, there the values are: > > ceph_osd_op_r_latency_sum: avg 1.09 ms, max 6.55 ms > ceph_osd_op_w_latency_sum: avg 4.46 ms, max 14.4 ms > ceph_osd_op_rw_latency_sum: avf 4.94 ms, max 17.6 ms > > I can't find any difference in hba controller settings, network or kerneltuning. Has someone got any ideas? > > Regards, > Erwin > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Sustainable and modern Infrastructures by ungleich.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx