Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

Frank Schilder <frans@xxxxxx> · Tue, 13 Sep 2022 11:14:01 +0000

Hi Boris.

> 3. wait some time (took around 5-20 minutes)

Sounds short. Might just have been the compaction that the OSDs do any ways on startup after upgrade. I don't know how to check for completed format conversion. What I see in your MON log is exactly what I have seen with default snap trim settings until all OSDs were converted. Once an OSD falls behind and slow ops start piling up, everything comes to a halt. Your logs clearly show a sudden drop of IOP/s on snap trim start and I would guess this is the cause of the slowly growing OPS back log of the OSDs.

If its not that, I don't know what else to look for.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Boris Behrens <bb@xxxxxxxxx>
Sent: 13 September 2022 12:58:19
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

Hi Frank,
we converted the OSDs directly on the upgrade.

1. installing new ceph versions
2. restart all OSD daemons
3. wait some time (took around 5-20 minutes)
4. all OSDs were online again.

So I would expect, that the OSDs are all upgraded correctly.
I also checked when the trimming happens, and it does not seem to be an issue on it's own, as the trim happens all the time in various sizes.

Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>>:
Are you observing this here: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Boris Behrens <bb@xxxxxxxxx<mailto:bb@xxxxxxxxx>>
Sent: 13 September 2022 11:43:20
To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
Subject:  laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

Hi, I need you help really bad.

we are currently experiencing a very bad cluster hangups that happen
sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
2022-09-12 in the evening)
We use krbd without cephx for the qemu clients and when the OSDs are
getting laggy, the krbd connection comes to a grinding halt, to a point
that all IO is staling and we can't even unmap the rbd device.

>From the logs, it looks like that the cluster starts to snaptrim a lot a
PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
I have attached the monitor log and the osd log (from one OSD) around the
time where it happened.

- is this a known issue?
- what can I do to debug it further?
- can I downgrade back to nautilus?
- should I upgrade the PGs for the pool to 4096 or 8192?

The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
show anything for the timeframe.

Cluster stats:
  cluster:
    id:     74313356-3b3d-43f3-bce6-9fb0e4591097
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
25h)
    mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
ceph-rbd-mon6
    osd: 149 osds: 149 up (since 6d), 149 in (since 7w)

  data:
    pools:   4 pools, 2241 pgs
    objects: 25.43M objects, 82 TiB
    usage:   231 TiB used, 187 TiB / 417 TiB avail
    pgs:     2241 active+clean

  io:
    client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr

--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
ssd    417 TiB  187 TiB  230 TiB   231 TiB      55.30
TOTAL  417 TiB  187 TiB  230 TiB   231 TiB      55.30

--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
isos                    7    64  455 GiB  117.92k  1.3 TiB   1.17     38 TiB
rbd                     8  2048   76 TiB   24.65M  222 TiB  66.31     38 TiB
archive                 9   128  2.4 TiB  669.59k  7.3 TiB   6.06     38 TiB
device_health_metrics  10     1   25 MiB      149   76 MiB      0     38 TiB

--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>

--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx