Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

Boris Behrens <bb@xxxxxxxxx> · Tue, 6 Dec 2022 15:38:56 +0100

Hi Janne,
that is a really good idea. Thank you.

I just saw, that our only ubuntu20.04 got very high %util (all 8TB disks)
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s
  wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm
d_await dareq-sz  aqu-sz  %util
sdc             19.00    112.00     0.00   0.00    0.32     5.89 1535.00
 68768.00  1260.00  45.08    1.33    44.80    0.00      0.00     0.00
0.00    0.00     0.00    1.44  76.00
sdd             62.00   5892.00    43.00  40.95    2.82    95.03 1196.00
 78708.00  1361.00  53.23    2.35    65.81    0.00      0.00     0.00
0.00    0.00     0.00    2.31  72.00
sde             33.00    184.00     0.00   0.00    0.33     5.58 1413.00
102592.00  1709.00  54.74    1.70    72.61    0.00      0.00     0.00
0.00    0.00     0.00    1.68  84.40
sdf             62.00   8200.00    63.00  50.40    9.32   132.26 1066.00
 74372.00  1173.00  52.39    1.68    69.77    0.00      0.00     0.00
0.00    0.00     0.00    1.80  70.00
sdg              5.00     40.00     0.00   0.00    0.40     8.00 1936.00
128188.00  2172.00  52.87    2.18    66.21    0.00      0.00     0.00
0.00    0.00     0.00    3.21  92.80
sdh            133.00   8636.00    44.00  24.86    4.14    64.93 1505.00
 87820.00  1646.00  52.24    0.95    58.35    0.00      0.00     0.00
0.00    0.00     0.00    1.09  78.80

I've cross checked the other 8TB disks in our cluster, which are around
30-50% with roughly the same IOPs.
Maybe I am missing some optimization, that is done on the centos7 nodes,
but not on the ubuntu20.04 node. (If you know something from the top of
your head, I am happy to hear it).
Maybe it is just another measuring on ubuntu.

But this was the first node where I restarted the OSDs and this is where I
waited the longest time, to see if anything is going better. The problem
nearly disappeared in a couple of seconds, after the last OSD was
restarted. So I would not blame that node in particular, but I will
investigate in this direction.

Am Di., 6. Dez. 2022 um 10:08 Uhr schrieb Janne Johansson <
icepic.dz@xxxxxxxxx>:

> Perhaps run "iostat -xtcy <list of OSD devices> 5" on the OSD hosts to
> see if any of the drives have weirdly high utilization despite low
> iops/requests?
>
>
> Den tis 6 dec. 2022 kl 10:02 skrev Boris Behrens <bb@xxxxxxxxx>:
> >
> > Hi Sven,
> > I am searching really hard for defect hardware, but I am currently out of
> > ideas:
> > - checked prometheus stats, but in all that data I don't know what to
> look
> > for (osd apply latency if very low at the mentioned point and went up to
> > 40ms after all OSDs were restarted)
> > - smartctl shows nothing
> > - dmesg show nothing
> > - network data shows nothing
> > - osd and clusterlogs show nothing
> >
> > If anybody got a good tip what I can check, that would be awesome. A
> string
> > in the logs (I made a copy from that days logs), or a tool to fire
> against
> > the hardware. I am 100% out of ideas what it could be.
> > In a time frame of 20s 2/3 of our OSDs went from "all fine" to "I am
> > waiting for the replicas to do their work" (log message 'waiting for sub
> > ops'). But there was no alert that any OSD had connection problems to
> other
> > OSDs. Additional the cluster_network is the same interface, switch,
> > everything as public_network. Only difference is the VLAN id (I plan to
> > remove the cluster_network because it does not provide anything for us).
> >
> > I am also planning to update all hosts from centos7 to ubuntu 20.04
> (newer
> > kernel, standardized OS config and so on).
> >
> > Am Mo., 5. Dez. 2022 um 14:24 Uhr schrieb Sven Kieske <
> S.Kieske@xxxxxxxxxxx
> > >:
> >
> > > On Sa, 2022-12-03 at 01:54 +0100, Boris Behrens wrote:
> > > > hi,
> > > > maybe someone here can help me to debug an issue we faced today.
> > > >
> > > > Today one of our clusters came to a grinding halt with 2/3 of our
> OSDs
> > > > reporting slow ops.
> > > > Only option to get it back to work fast, was to restart all OSDs
> daemons.
> > > >
> > > > The cluster is an octopus cluster with 150 enterprise SSD OSDs. Last
> work
> > > > on the cluster: synced in a node 4 days ago.
> > > >
> > > > The only health issue, that was reported, was the SLOW_OPS. No slow
> pings
> > > > on the networks. No restarting OSDs. Nothing.
> > > >
> > > > I was able to ping it to a 20s timeframe and I read ALL the logs in
> a 20
> > > > minute timeframe around this issue.
> > > >
> > > > I haven't found any clues.
> > > >
> > > > Maybe someone encountered this in the past?
> > >
> > > do you happen to run your rocksdb on a dedicated caching device (nvme
> ssd)?
> > >
> > > I observed slow ops in octopus after a faulty nvme ssd was inserted in
> one
> > > ceph server.
> > > as was said in other mails, try to isolate your root cause.
> > >
> > > maybe the node added 4 days ago was the culprit here?
> > >
> > > we were able to pinpoint the nvme by monitoring the slow osds
> > > and the commonality in this case was the same nvme cache device.
> > >
> > > you should always benchmark new hardware/perform burn-in tests imho,
> which
> > > is not always possible due to environment constraints.
> > >
> > > --
> > > Mit freundlichen Grüßen / Regards
> > >
> > > Sven Kieske
> > > Systementwickler / systems engineer
> > >
> > >
> > > Mittwald CM Service GmbH & Co. KG
> > > Königsberger Straße 4-6
> > > 32339 Espelkamp
> > >
> > > Tel.: 05772 / 293-900
> > > Fax: 05772 / 293-333
> > >
> > > https://www.mittwald.de
> > >
> > > Geschäftsführer: Robert Meyer, Florian Jürgens
> > >
> > > St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad
> Oeynhausen
> > > Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad
> Oeynhausen
> > >
> > > Informationen zur Datenverarbeitung im Rahmen unserer
> Geschäftstätigkeit
> > > gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.
> > >
> > >
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> > groÃƒ¼en Saal.
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> --
> May the most significant bit of your life be positive.
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx