Hi Tim,
thanks for the feedback, highly appreciated.
Out of curiosity - have you found out what was the problem with that
OSD? Some hardware issues?
Regards,
Igor
On 10/14/2024 11:58 AM, Tim Sauerbein wrote:
Hi Igor,
Thanks for the valuable advice! I just wanted to provide feedback that it was indeed one single OSD causing the issues which I could triangulate as you said. After removing this OSD, the slow ops haven't occurred anymore.
Best regards,
Tim
On 1 Oct 2024, at 12:42, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
Hi Tim,
first of all - given the provided logs - all the slow operastions are stuck in 'waiting for sub ops' state.
Which apparently means that reported OSDs aren't suffering from local issues but stuck on replication operations to their peer OSDs.
From my experince even a single "faulty" osd could cause such issues to multiple other daemons. And the way to troubleshoot is to find out what are the actual culprit OSD(s).
To do that one might try to use the following approach:
1. When (or shortly after) the issue is happening - run 'ceph daemon osd.N dump_historic_ops' (or even 'dump_ops_in_flight') command against OSDs reporting slow operations.
2. From the above reports choose operations with extraordinary high duration, e.g. > 5 seconds and learn PG ids they've been run against, e.g. PG = 1.a in the following sample:
"description": "osd_op(client.24184.0:23 >>>>1.a<<<<< 1:54253539:::benchmark_data_coalmon_70932_object22:head [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e19)",
3. For affected PG(s) learn which OSDs are backing specific it. E.g. by running ceph pg map <pgid>
4. If different PGs from the above step use specific OSD which is common to all (the majority) of them - higly likely it's a good candidate for additional investigation - partcularly relevant OSD logs inspection.
Thanks,
Igor
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx