CEPH quincy 17.2.5 with Erasure Code

Andrea Martra <andrea.martra@xxxxxxxx> · Mon, 20 May 2024 10:43:01 +0200

Hi everyone,
I'm managing a Ceph Quincy 17.2.5 cluster, waiting to upgrade it to 
version 17.2.7, composed and configured as follows:

- 16 identical nodes 256 GB RAM, 32 CPU Cores (64 threads), 12 x rotary 
HDD (BLOCK) + 4 x Sata SSD (RocksDB/WAL)
- Erasure Code 11+4 (Jerasure)
- 10 x S3 RGW on dedicated nodes (5 physical nodes)
- 3 x full SSD dedicated nodes for replicated S3 pools
- 2 x 10 Gbit Public network (LACP) + 2 x 10 Gbit cluster network (LACP)
- On all nodes: Ubuntu 20.04.4 LTS Operating System updated
- Ceph deployed on containers on Docker CE (docker-ce 
5:20.10.17~3-0~ubuntu-focal).

All pools, except the EC data pool, are configured with replication 3 
and stored in dedicated SSD devices on 3 dedicated nodes to guarantee 
the necessary performance.

We encountered a constant and random problem relating to the 
availability of access to bucket data and many slow_ops relating to 
rotary OSDs (data-pools) not caused by saturation of physical devices 
nor by the availability of CPU/RAM on all nodes.
It sometimes happens that slow_ops caused by some requests to some PCs 
in a random way are reported.
The cluster is currently in the recovery/rebalance phase for the 
reconstruction of 3 HDDs that we had to recreate from scratch (all 3 
HDDs are physically on the same node).

By doing some analysis of the events we verified the following from the 
status of some OSDs impacted by slow_ops:

20/05/2024 10:19 •

            "description": "osd_op(client.186021790.0:57620 29.258s0 
29:1a5928ea:::31497ca8-e7d6-4e53-b150-91f9ac02ac67.246100.6329_storage%2ffirstMemories%2f1010543%2f:head 
[getxattrs,stat] snapc 0=[] 
ondisk+read+known_if_redirected+supports_pool_eio e481205)",

            "initiated_at": "2024-05-16T15:03:06.015956+0000",

            "age": 963.83795990199997,

            "duration": 963.83819621700002,

            "type_data": {

                "flag_point": "delayed",

                "client_info": {

                    "client": "client.186021790",

                    "client_addr": "10.151.11.11:0/3913909849",

                    "tid": 57620

                },

                "events": [

                    {

                        "event": "initiated",

                        "time": "2024-05-16T15:03:06.015956+0000",

                        "duration": 0

                    },

                    {

                        "event": "throttled",

                        "time": "2024-05-16T15:03:06.015956+0000",

                        "duration": 0

                    },

                    {

                        "event": "header_read",

                        "time": "2024-05-16T15:03:06.015954+0000",

                        "duration": 4294967295.9999986

                    },

                    {

                        "event": "all_read",

                        "time": "2024-05-16T15:03:06.015961+0000",

                        "duration": 7.2300000000000002e-06

                    },

                    {

                        "event": "dispatched",

                        "time": "2024-05-16T15:03:06.015962+0000",

                        "duration": 1.063e-06

                    },

                    {

                        "event": "queued_for_pg",

                        "time": "2024-05-16T15:03:06.015966+0000",

                        "duration": 3.332e-06

                    },

                    {

                        "event": "reached_pg",

                        "time": "2024-05-16T15:03:06.015992+0000",

                        "duration": 2.6078e-05

                    },

                    {

                        "event": "waiting for readable",

                        "time": "2024-05-16T15:03:06.016002+0000",

                        "duration": 1.0348e-05

                    }

                ]

            }

        }

    ],

    "num_ops": 6

}

###########

                    {

                        "event": "reached_pg",

                        "time": "2024-05-16T12:43:11.694220+0000",

                        "duration": 480.97258642200001

                    },

In essence the operations remain suspended in this condition:
"event": "waiting for readable"
trying to access some PGs.

We intend to update the version as soon as the recovery/rebalance is 
completed.
Does anyone have any idea what checks I could do to analyze the problem 
more thoroughly?

I can't define whether the problem could be the use of the EC or whether 
the data written in some buckets is in a "non-standard" condition, which 
causes the access to wait for some reason.

Thank you all for your kindness.
Greetings

Andrea Martra

--
--

Andrea Martra
+39 393 9048451
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx