Re: cephadm: update fewer OSDs at a time?

Adam King <adking@xxxxxxxxxx> · Mon, 14 Feb 2022 14:13:08 -0500

There was actually a change made to allow upgrading osds in a more parallel
fashion nearly a year ago (https://github.com/ceph/ceph/pull/39726) that
made its way into pacific but not octopus which could explain the
discrepancy here. I guess we need a flag to have the upgrade not do this
for users who'd like to maintain higher I/O throughput at the cost of
upgrade speed.

On Mon, Feb 14, 2022 at 11:21 AM Eugen Block <eblock@xxxxxx> wrote:

> It does update only one OSD at a time, I did that in my little test
> cluster on Octopus today. I haven’t played too much with Pacific yet,
> maybe some things have changed there?
>
> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>
> > Hi Eugen,
> >
> > Thanks for this. All of our pools are size=3 and min_size=2, failure
> domain
> > is host. For example, we experience random I/O stalls on this pool during
> > upgrades: https://pastebin.com/iVVxJ9TF (I pasted pool and crush info
> into
> > pastebin for better readability), which in theory shouldn't be happening
> as
> > there always are 2 more hosts with 2 more OSDs per PG when OSDs on 1 host
> > are being upgraded. The output of `ceph pg ls-by-pool` is rather lengthy
> as
> > there are 256 PGs in this particular pool, but I personally verified each
> > PG to be supported by 3 distinct OSDs, each of the 3 on a different host.
> >
> > I was hoping that by forcing cephadm to upgrade 1 OSD at a time instead
> of
> > 1 host at a time we could resolve this issue.
> >
> > /Z
> >
> > On Mon, Feb 14, 2022 at 4:26 PM Eugen Block <eblock@xxxxxx> wrote:
> >
> >> Hi,
> >>
> >> what are your rulesets for the affected pools? As far as I remember
> >> the orchestrator updates one OSD node at a time, but not multiple OSDs
> >> at once, only one by one. It checks with the "ok-to-stop" command if
> >> an upgrade of that daemon can proceed, so as long as you have host as
> >> failure domain there should be no I/O disruption for clients. Maybe
> >> you have some pools with size = 2 and min_size = 2?
> >>
> >> Regards,
> >> Eugen
> >>
> >>
> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
> >>
> >> > Hi!
> >> >
> >> > Sometimes when we upgrade our cephadm-managed 16.2.x cluster, cephadm
> >> > decides that it's safe to upgrade a bunch of OSDs at a time, as a
> result
> >> > sometimes RBD-backed Openstack VMs appear to get I/O stalls and
> read-only
> >> > filesystems. Is there a way to make cephadm upgrade fewer OSDs at a
> time,
> >> > or perhaps upgrade them one by one? I don't care if that takes a lot
> more
> >> > time, as long as there's no I/O interruption.
> >> >
> >> > I would appreciate any advice.
> >> >
> >> > Best regards,
> >> > Zakhar
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx