Re: SLOW_OPS problems

Tim Sauerbein <sauerbein@xxxxxxxxxx> · Mon, 14 Oct 2024 09:58:30 +0100

Hi Igor,

Thanks for the valuable advice! I just wanted to provide feedback that it was indeed one single OSD causing the issues which I could triangulate as you said. After removing this OSD, the slow ops haven't occurred anymore.

Best regards,
Tim

> On 1 Oct 2024, at 12:42, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
> 
> Hi Tim,
> 
> first of all - given the provided logs - all the slow operastions are stuck in 'waiting for sub ops' state.
> 
> Which apparently means that reported OSDs aren't suffering from local issues but stuck on replication operations to their peer OSDs.
> 
> From my experince even a single "faulty" osd could cause such issues to multiple other daemons. And the way to troubleshoot is to find out what are the actual culprit OSD(s).
> 
> To do that one might try to use the following approach:
> 
> 1. When (or shortly after) the issue is happening - run 'ceph daemon osd.N dump_historic_ops' (or even 'dump_ops_in_flight') command against OSDs reporting slow operations.
> 
> 2. From the above reports choose operations with extraordinary high duration, e.g. > 5 seconds and learn PG ids they've been run against, e.g. PG = 1.a in the following sample:
> 
>             "description": "osd_op(client.24184.0:23 >>>>1.a<<<<< 1:54253539:::benchmark_data_coalmon_70932_object22:head [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e19)",
> 
> 3. For affected PG(s) learn which OSDs are backing specific it. E.g. by running ceph pg map <pgid>
> 
> 4. If different PGs from the above step use specific OSD which is common to all (the majority) of them - higly likely it's a good candidate for additional investigation - partcularly relevant OSD logs inspection.
> 
> 
> Thanks,
> 
> Igor

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx