Re: Severe Latency Issues in Ceph Cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The Ceph version is 17.2.7.


• OSDs are a mix of SSD and HDD, with DB/WAL colocated on the same OSDs.

• SSDs are used for metadata and index pools with replication 3.

• HDDs store the data pool using EC 4+2.


Interestingly, the same issue has appeared on another cluster where DB/WAL
is placed on NVMe disks, but the pool distribution is the same: meta and
index on SSDs, and data on HDDs.


It seems to be network-related, as I’ve checked the interfaces, and there
are no obvious hardware or connectivity issues. However, we’re still seeing
a high number of retransmissions and duplicate packets on the network.


Let me know if you have any insights or suggestions.


On Mon, Mar 3, 2025 at 12:36 Stefan Kooman <stefan@xxxxxx> wrote:

> On 01-03-2025 15:10, Ramin Najjarbashi wrote:
> > Hi
> > We are currently facing severe latency issues in our Ceph cluster,
> > particularly affecting read and write operations. At times, write
> > operations completely stall, leading to significant service degradation.
> > Below is a detailed breakdown of the issue, our observations, and the
> > mitigation steps we have taken so far. We would greatly appreciate any
> > insights or suggestions.
>
> What ceph version?
>
> How are OSDs provisioned (WAL+DB, single OSD, etc.). Type of disks.
>
> Gr. Stefan
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux