Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

Tyler Stachecki <stachecki.tyler@xxxxxxxxx> · Mon, 26 Sep 2022 20:00:36 -0400

Just a datapoint - we upgraded several large Mimic-born clusters straight
to 15.2.12 with the quick fsck disabled in ceph.conf, then did
require-osd-release, and finally did the omap conversion offline after the
cluster was upgraded using the bluestore tool while the OSDs were down (all
done in batches). Clusters are zippy as ever.

Maybe on a whim, try doing an offline fsck with the bluestore tool and see
if it improves things?

To answer an earlier question, if you have no health statuses muted, a
'ceph health detail' should show you at least a subset of OSDs that have
not gone through the omap conversion yet.

Cheers,
Tyler

On Mon, Sep 26, 2022, 5:13 PM Marc <Marc@xxxxxxxxxxxxxxxxx> wrote:

> Hi Frank,
>
> Thank you very much for this! :)
>
> >
> > we just completed a third upgrade test. There are 2 ways to convert the
> > OSDs:
> >
> > A) convert along with the upgrade (quick-fix-on-start=true)
> > B) convert after setting require-osd-release=octopus (quick-fix-on-
> > start=false until require-osd-release set to octopus, then restart to
> > initiate conversion)
> >
> > There is a variation A' of A: follow A, then initiate manual compaction
> > and restart all OSDs.
> >
> > Our experiments show that paths A and B do *not* yield the same result.
> > Following path A leads to a severely performance degraded cluster. As of
> > now, we cannot confirm that A' fixes this. It seems that the only way
> > out is to zap and re-deploy all OSDs, basically what Boris is doing
> > right now.
> >
> > We extended now our procedure to adding
> >
> >   bluestore_fsck_quick_fix_on_mount = false
> >
> > to every ceph.conf file and executing
> >
> >   ceph config set osd bluestore_fsck_quick_fix_on_mount false
> >
> > to catch any accidents. After daemon upgrade, we set
> > bluestore_fsck_quick_fix_on_mount = true host by host in the ceph.conf
> > and restart OSDs.
> >
> > This procedure works like a charm.
> >
> > I don't know what the difference between A and B is. It is possible that
> > B executes an extra step that is missing in A. The performance
> > degradation only shows up when snaptrim is active, but then it is very
> > severe. I suspect that many users who complained about snaptrim in the
> > past have at least 1 A-converted OSD in their cluster.
> >
> > If you have a cluster upgraded with B-converted OSDs, it works like a
> > native octopus cluster. There is very little performance reduction
> > compared with mimic. In exchange, I have the impression that it operates
> > more stable.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx