Re: Assistance Needed with Ceph Cluster Slow Ops Issue

Peter <petersun@xxxxxxxxxxxx> · Wed, 6 Dec 2023 19:17:46 +0000

Thank you for pointing this out. I did check my cluster by using the article given command, it over 17 million PG dups over each OSDs.
May I know if the snaptrim activity takes place every six hours?  If I disable the snaptrim, will it stop the slow ops temporarily before my performing version upgrade?
If I want to upgrade my Ceph, it will take time to analysis the environment. Can I have work around quickly for delete OSD then create it again for zeroized the log times? or manually delete the OSD log?

________________________________
From: Boris <bb@xxxxxxxxx>
Sent: Wednesday, December 6, 2023 10:13
To: Peter <petersun@xxxxxxxxxxxx>
Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: Re:  Assistance Needed with Ceph Cluster Slow Ops Issue

Hi Peter,

try to set the cluster to nosnaptrim

If this helps, you might need to upgrade to pacific, because you are hit by the pg dups bug.

See: https://www.clyso.com/blog/how-to-identify-osds-affected-by-pg-dup-bug/

Mit freundlichen Grüßen
 - Boris Behrens

Am 06.12.2023 um 19:01 schrieb Peter <petersun@xxxxxxxxxxxx>:

Dear all,

I am reaching out regarding an issue with our Ceph cluster that has been recurring every six hours. Upon investigating the problem using the "ceph daemon dump_historic_slow_ops" command, I observed that the issue appears to be related to slow operations, specifically getting stuck at "Waiting for RW Locks." The wait times often range from one to two seconds.

Our cluster use SAS SSD disks from Samsung for the storage pool in question. While these disks are of high quality and should provide sufficient speed, the problem persists. The slow ops occurrence is consistent every six hours.

I would greatly appreciate any insights or suggestions you may have to address and resolve this issue. If there are specific optimizations or configurations that could improve the situation, please advise.

below are some output:

root@lasas003:~# ceph -v
ceph version 15.2.17 (542df8d06ef24dbddcf4994db16bcc4c89c9ec2d) octopus (stable)

"events": [

                   {
                       "event": "initiated",
                       "time": "2023-12-06T08:34:18.501644-0800",
                       "duration": 0
                   },
                   {
                       "event": "throttled",
                       "time": "2023-12-06T08:34:18.501644-0800",
                       "duration": 3.067e-06
                   },
                   {
                       "event": "header_read",
                       "time": "2023-12-06T08:34:18.501647-0800",
                       "duration": 3.5429999999999998e-06
                   },
                   {
                       "event": "all_read",
                       "time": "2023-12-06T08:34:18.501650-0800",
                       "duration": 9.3399999999999997e-07
                   },
                   {
                       "event": "dispatched",
                       "time": "2023-12-06T08:34:18.501651-0800",
                       "duration": 3.2830000000000002e-06
                   },
                   {
                       "event": "queued_for_pg",
                       "time": "2023-12-06T08:34:18.501654-0800",
                       "duration": 1.3819939990000001
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:19.883648-0800",
                       "duration": 5.7980000000000002e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:19.883654-0800",
                       "duration": 4.2484711649999998
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:24.132125-0800",
                       "duration": 1.0667e-05
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:24.132136-0800",
                       "duration": 2.1593527840000002
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:26.291489-0800",
                       "duration": 3.292e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:26.291492-0800",
                       "duration": 0.43918164700000001
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:26.730674-0800",
                       "duration": 5.1529999999999996e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:26.730679-0800",
                       "duration": 1.0531516869999999
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:27.783831-0800",
                       "duration": 5.1329999999999998e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:27.783836-0800",
                       "duration": 1.232525088
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:29.016361-0800",
                       "duration": 3.844e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:29.016365-0800",
                       "duration": 0.0051385700000000003
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:29.021503-0800",
                       "duration": 4.7600000000000002e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:29.021508-0800",
                       "duration": 0.0092808779999999994
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:29.030789-0800",
                       "duration": 4.0690000000000003e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:29.030793-0800",
                       "duration": 0.55757725499999999
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:29.588370-0800",
                       "duration": 5.5060000000000003e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:29.588376-0800",
                       "duration": 0.0064168929999999999
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:29.594793-0800",
                       "duration": 7.0690000000000004e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:29.594800-0800",
                       "duration": 0.0026404089999999998
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:29.597440-0800",
                       "duration": 3.3440000000000001e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:29.597444-0800",
                       "duration": 0.0051126670000000004
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:29.602556-0800",
                       "duration": 5.0200000000000002e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:29.602561-0800",
                       "duration": 0.0040569960000000002
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:29.606618-0800",
                       "duration": 5.0989999999999998e-06
                   },
                   {
                       "event": "waiting for rw locks",
                       "time": "2023-12-06T08:34:29.606623-0800",
                       "duration": 0.0068874100000000001
                   },
                   {
                       "event": "reached_pg",
                       "time": "2023-12-06T08:34:29.613511-0800",
                       "duration": 1.4636e-05
                   },
                   {
                       "event": "started",
                       "time": "2023-12-06T08:34:29.613525-0800",
                       "duration": 0.00028943699999999997
                   },
                   {
                       "event": "done",
                       "time": "2023-12-06T08:34:29.613815-0800",
                       "duration": 11.112171102
                   }

Thank you in advance for your assistance.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx