Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

Boris Behrens <bb@xxxxxxxxx> · Tue, 13 Sep 2022 16:47:14 +0200

The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that
this should be done fairly fast.
For now I will recreate every OSD in the cluster and check if this helps.

Do you experience slow OPS (so the cluster shows a message like "cluster
[WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec,
daemons
[osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]...
have slow ops. (SLOW_OPS)")?

I can also see a huge spike in the load of all hosts in our cluster for a
couple of minutes.

Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder <frans@xxxxxx>:

> Hi Boris.
>
> > 3. wait some time (took around 5-20 minutes)
>
> Sounds short. Might just have been the compaction that the OSDs do any
> ways on startup after upgrade. I don't know how to check for completed
> format conversion. What I see in your MON log is exactly what I have seen
> with default snap trim settings until all OSDs were converted. Once an OSD
> falls behind and slow ops start piling up, everything comes to a halt. Your
> logs clearly show a sudden drop of IOP/s on snap trim start and I would
> guess this is the cause of the slowly growing OPS back log of the OSDs.
>
> If its not that, I don't know what else to look for.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Boris Behrens <bb@xxxxxxxxx>
> Sent: 13 September 2022 12:58:19
> To: Frank Schilder
> Cc: ceph-users@xxxxxxx
> Subject: Re:  laggy OSDs and staling krbd IO after upgrade
> from nautilus to octopus
>
> Hi Frank,
> we converted the OSDs directly on the upgrade.
>
> 1. installing new ceph versions
> 2. restart all OSD daemons
> 3. wait some time (took around 5-20 minutes)
> 4. all OSDs were online again.
>
> So I would expect, that the OSDs are all upgraded correctly.
> I also checked when the trimming happens, and it does not seem to be an
> issue on it's own, as the trim happens all the time in various sizes.
>
> Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder <frans@xxxxxx
> <mailto:frans@xxxxxx>>:
> Are you observing this here:
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Boris Behrens <bb@xxxxxxxxx<mailto:bb@xxxxxxxxx>>
> Sent: 13 September 2022 11:43:20
> To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> Subject:  laggy OSDs and staling krbd IO after upgrade from
> nautilus to octopus
>
> Hi, I need you help really bad.
>
> we are currently experiencing a very bad cluster hangups that happen
> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> 2022-09-12 in the evening)
> We use krbd without cephx for the qemu clients and when the OSDs are
> getting laggy, the krbd connection comes to a grinding halt, to a point
> that all IO is staling and we can't even unmap the rbd device.
>
> From the logs, it looks like that the cluster starts to snaptrim a lot a
> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
> I have attached the monitor log and the osd log (from one OSD) around the
> time where it happened.
>
> - is this a known issue?
> - what can I do to debug it further?
> - can I downgrade back to nautilus?
> - should I upgrade the PGs for the pool to 4096 or 8192?
>
> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> show anything for the timeframe.
>
> Cluster stats:
>   cluster:
>     id:     74313356-3b3d-43f3-bce6-9fb0e4591097
>     health: HEALTH_OK
>
>   services:
>     mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
> 25h)
>     mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
> ceph-rbd-mon6
>     osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
>
>   data:
>     pools:   4 pools, 2241 pgs
>     objects: 25.43M objects, 82 TiB
>     usage:   231 TiB used, 187 TiB / 417 TiB avail
>     pgs:     2241 active+clean
>
>   io:
>     client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr
>
> --- RAW STORAGE ---
> CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
> ssd    417 TiB  187 TiB  230 TiB   231 TiB      55.30
> TOTAL  417 TiB  187 TiB  230 TiB   231 TiB      55.30
>
> --- POOLS ---
> POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX
> AVAIL
> isos                    7    64  455 GiB  117.92k  1.3 TiB   1.17     38
> TiB
> rbd                     8  2048   76 TiB   24.65M  222 TiB  66.31     38
> TiB
> archive                 9   128  2.4 TiB  669.59k  7.3 TiB   6.06     38
> TiB
> device_health_metrics  10     1   25 MiB      149   76 MiB      0     38
> TiB
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groÃƒ¼en Saal.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
> ceph-users-leave@xxxxxxx>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groÃƒ¼en Saal.
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx