Re: Severe Latency Issues in Ceph Cluster

Dan van der Ster <dan.vanderster@xxxxxxxxx> · Tue, 4 Mar 2025 10:40:14 -0800

> it's not Ceph but the network

It's almost always the network ;-)

Ramin: This reminds me of an outage we had at CERN caused by routing /
ECMP / faulty line card.
One of the main symptoms of that is high tcp retransmits on the Ceph nodes.

Basically, OSDs keep many connections open with each other, with
different src/dst port combinations. If your cluster has OSD hosts
connected across routers, then you're likely using ECMP, and each
connection src/dst ip/port combination takes a different path
(different routers, different line cards). Then what happens is that
if one line card is faulty -- which is often difficult to alert on --
some of the connections will work, but some will not. This is visible
in the host retransmit counters, and it causes OSDs to flap up and
down or other badness.

One quick way to diagnose if this is the root cause here is to use
netcat to try to connect between two ceph hosts using a range of
source ports.
E.g, assuming you can ssh from one OSD host to another, do this from
one ceph host:

echo {20000..20050} | xargs -t -n1 -I{} nc -z -p {} <other ceph osd> 22

If all your network paths are okay -- you'll get something like in the
PS. If some paths are broken, you'll get errors!

Hope that helps.

-- dan

bash-5.2$ echo {20000..20050} | xargs -t -n1 -I{} nc -z -p {} 192.168.1.248 22
nc -z -p 20000 192.168.1.248 22
Connection to 192.168.1.248 port 22 [tcp/ssh] succeeded!
nc -z -p 20001 192.168.1.248 22
Connection to 192.168.1.248 port 22 [tcp/ssh] succeeded!
nc -z -p 20002 192.168.1.248 22
Connection to 192.168.1.248 port 22 [tcp/ssh] succeeded!
nc -z -p 20003 192.168.1.248 22
Connection to 192.168.1.248 port 22 [tcp/ssh] succeeded!
nc -z -p 20004 192.168.1.248 22
Connection to 192.168.1.248 port 22 [tcp/ssh] succeeded!
nc -z -p 20005 192.168.1.248 22
Connection to 192.168.1.248 port 22 [tcp/ssh] succeeded!
...

--
Dan van der Ster
Ceph Executive Council | CTO @ CLYSO
Try our Ceph Analyzer -- https://analyzer.clyso.com/
https://clyso.com | dan.vanderster@xxxxxxxxx

On Tue, Mar 4, 2025 at 12:08 AM Eugen Block <eblock@xxxxxx> wrote:
>
> A few years ago, one of our customers complained about latency issues.
> We investigated and the only real evidence we found were also high
> retransmit values. So we recommended to let their network team look
> into it. For months they refused to do anything, until they hired
> another company to investigate the network. It was a network issue,
> basically all cabling was replaced. I don't recall anymore if switches
> and other components were replaced as well, but it definitely was
> resolved after that. So if you ask me, I'd say it's not Ceph but the
> network. ;-)
>
> Zitat von Ramin Najjarbashi <ramin.najarbashi@xxxxxxxxx>:
>
> > The Ceph version is 17.2.7.
> >
> >
> > • OSDs are a mix of SSD and HDD, with DB/WAL colocated on the same OSDs.
> >
> > • SSDs are used for metadata and index pools with replication 3.
> >
> > • HDDs store the data pool using EC 4+2.
> >
> >
> > Interestingly, the same issue has appeared on another cluster where DB/WAL
> > is placed on NVMe disks, but the pool distribution is the same: meta and
> > index on SSDs, and data on HDDs.
> >
> >
> > It seems to be network-related, as I’ve checked the interfaces, and there
> > are no obvious hardware or connectivity issues. However, we’re still seeing
> > a high number of retransmissions and duplicate packets on the network.
> >
> >
> > Let me know if you have any insights or suggestions.
> >
> >
> > On Mon, Mar 3, 2025 at 12:36 Stefan Kooman <stefan@xxxxxx> wrote:
> >
> >> On 01-03-2025 15:10, Ramin Najjarbashi wrote:
> >> > Hi
> >> > We are currently facing severe latency issues in our Ceph cluster,
> >> > particularly affecting read and write operations. At times, write
> >> > operations completely stall, leading to significant service degradation.
> >> > Below is a detailed breakdown of the issue, our observations, and the
> >> > mitigation steps we have taken so far. We would greatly appreciate any
> >> > insights or suggestions.
> >>
> >> What ceph version?
> >>
> >> How are OSDs provisioned (WAL+DB, single OSD, etc.). Type of disks.
> >>
> >> Gr. Stefan
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx