Re: High overwrite latency

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> · Thu, 23 Sep 2021 12:28:08 +0900

Hey Erwin,

I'd recommend to checkout the individual OSD performance in the slower
cluster. We have seen such issues with SSDs that wore off - it might
just be a specific OSD / pg that you are hitting.

Best regards,

Nico

Erwin Ceph <ceph@xxxxxxxxxxxxxxxxx> writes:

> Hi,
>
> We do run several Ceph clusters, but one has a strange problem.
>
> It is running Octopus 15.2.14 on 9 (HP 360 Gen 8, 64 GB, 10 Gbps) servers, 48 OSDs (all 2 TB Samsung SSDs with Bluestore). Monitoring in Grafana shows these three latency values
> over 7 days:
>
> ceph_osd_op_r_latency_sum: avg 1.16 ms, max 9.95 ms
> ceph_osd_op_w_latency_sum: avg 5.85 ms, max 26.2 ms
> ceph_osd_op_rw_latency_sum: avf 110 ms, max 388 ms
>
> Average throughput is around 30 MB/sec read and 40 MB/sec write. Both with 2000 iops.
>
> On another cluster (hardware almost the same, identical software versions), but 25% lower load, there the values are:
>
> ceph_osd_op_r_latency_sum: avg 1.09 ms, max 6.55 ms
> ceph_osd_op_w_latency_sum: avg 4.46 ms, max 14.4 ms
> ceph_osd_op_rw_latency_sum: avf 4.94 ms, max 17.6 ms
>
> I can't find any difference in hba controller settings, network or kerneltuning. Has someone got any ideas?
>
> Regards,
> Erwin
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Sustainable and modern Infrastructures by ungleich.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx