50% performance drop after disk failure

Michael Eichenberger <michael.eichenberger@xxxxxxxxxxxxxxxxx> · Sat, 9 Jul 2022 15:18:21 +0200

Hi all,

We currently run out of ideas what causes a 50% performance decrease on 
disk io, after removing a OSD out of our cluster.

Performance measurement is regularly done by a canary virtual machine 
which does an hourly disk IO measurement. Performance degradation is 
reported by customers on other virtual machines also.

Symptoms:
- After taking a failed disk out of our ceph cluster ('ceph osd out X')
  the canary VM measures a 50% performance degree.
- Finishing re-balancing did not have an impact on performance
- 'recovery io' as reported by ceph status is as high a usually
- 'client io' as reported by ceph status is significant lower than
  usual, peaks are approximately factor 10 lower than 'recovery io'
  which was not the case before.

Actions/Checks done all without impact on performance:
- Logs do not show any indication of failures or irregularities
  (but we found a flapping OSD, which we also took out without
  further performance impact !).
- No full or near full osd, PG's are balanced on OSDs.
- Network operates in usual manner (and did not change); no saturation
  or high usage on links; bonds are ok; MTU settings checked and ok.
- Crush map does not show any unexpected entries.
- Boot of mons (one after another).
- Boot of storage nodes (one after another).

Cluster information:
- Version: ceph version 10.2.11 (Jewel)
- Operating System: CentOS 7
- 3 mons, 180 OSD on 8 storage nodes
- 382 TB used, 661 TB / 1043 TB avail
- OSD's on NVME, SSD and Disks, pools mapped
  to either type (no mixed pools).
- All OSD's use filestore

Since we currently run out of ideas what could cause these performance 
troubles, we appreciate any hint that increases the probability to find 
a solution !

Thanks in advance.

With best regards, Michael

--
stepping stone AG
Wasserwerkgasse 7
CH-3011 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
michael.eichenberger@xxxxxxxxxxxxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx