Re: Kingston DC500M IO problems

Janne Johansson <icepic.dz@xxxxxxxxx> · Fri, 25 Mar 2022 16:19:53 +0100

Do read this

https://yourcmc.ru/wiki/index.php?title=Ceph_performance&mobileaction=toggle_view_desktop#Drive_cache_is_slowing_you_down

and see if one or both of the drives rather have write cache on or
off. At least some microns we have want it off for better perf in
ceph.

Den fre 25 mars 2022 kl 15:01 skrev Frank Schilder <frans@xxxxxx>:
>
> Dear all,
>
> we are using a bunch of Kingston DC500M drives in our cluster for an all-flash 6+2 EC pool used as a data pool for RBD images. For quite a while not I observe that these drives seem to stall for extended periods of time, sometimes to an extend that they are marked down. Here are the boot events for one day as an example (sorted by OSD, not date stamp):
>
> 2022-03-24 05:22:23.713736 mon.ceph-01 mon.0 192.168.32.65:6789/0 200841 : cluster [INF] osd.584 192.168.32.89:6814/4765 boot
> 2022-03-24 07:11:10.032115 mon.ceph-01 mon.0 192.168.32.65:6789/0 202319 : cluster [INF] osd.584 192.168.32.89:6814/4765 boot
> 2022-03-24 07:11:08.618319 mon.ceph-01 mon.0 192.168.32.65:6789/0 202315 : cluster [INF] osd.585 192.168.32.89:6810/4767 boot
> 2022-03-24 12:24:02.790395 mon.ceph-01 mon.0 192.168.32.65:6789/0 206344 : cluster [INF] osd.585 192.168.32.89:6810/4767 boot
> 2022-03-24 06:55:10.513353 mon.ceph-01 mon.0 192.168.32.65:6789/0 202062 : cluster [INF] osd.594 192.168.32.91:6802/272337 boot
> 2022-03-24 06:55:10.513303 mon.ceph-01 mon.0 192.168.32.65:6789/0 202061 : cluster [INF] osd.595 192.168.32.91:6804/272338 boot
> 2022-03-24 20:34:31.991914 mon.ceph-01 mon.0 192.168.32.65:6789/0 218334 : cluster [INF] osd.595 192.168.32.91:6804/272338 boot
> 2022-03-24 02:15:11.231804 mon.ceph-01 mon.0 192.168.32.65:6789/0 197965 : cluster [INF] osd.596 192.168.32.83:6829/4755 boot
> 2022-03-24 04:58:24.831549 mon.ceph-01 mon.0 192.168.32.65:6789/0 200555 : cluster [INF] osd.596 192.168.32.83:6829/4755 boot
> 2022-03-24 03:02:16.971836 mon.ceph-01 mon.0 192.168.32.65:6789/0 199130 : cluster [INF] osd.603 192.168.32.84:6814/4738 boot
> 2022-03-24 13:56:15.723368 mon.ceph-01 mon.0 192.168.32.65:6789/0 207508 : cluster [INF] osd.604 192.168.32.82:6806/4639 boot
> 2022-03-24 07:24:42.557331 mon.ceph-01 mon.0 192.168.32.65:6789/0 202530 : cluster [INF] osd.606 192.168.32.84:6831/4605 boot
> 2022-03-24 01:26:23.313526 mon.ceph-01 mon.0 192.168.32.65:6789/0 197079 : cluster [INF] osd.609 192.168.32.84:6817/4603 boot
> 2022-03-24 07:24:42.557288 mon.ceph-01 mon.0 192.168.32.65:6789/0 202529 : cluster [INF] osd.609 192.168.32.84:6817/4603 boot
> 2022-03-24 05:48:09.449210 mon.ceph-01 mon.0 192.168.32.65:6789/0 201169 : cluster [INF] osd.614 192.168.32.85:6826/4777 boot
>
> We have 2 types of drives in this pool, Micron 5200Pro 1.92 TB (1 OSD per disk) and the Kingston DC500M 3.84 TB (2 OSDs per disk). The above boot events are exclusively on Kingston drives. After adding these drives, we didn't have any problems for a year or so. This started recently, maybe 3-4 months ago. My guess is that its because these drives are halfway filled now and probably through several disk writes and that the controller has sometimes problems flushing writes or allocating blocks for writes.
>
> Is anyone else using these drives?
> Did anyone else make a similar experience and has a way to solve that?
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx