Re: Successful Upgrade from 14.2.22 to 15.2.14

Eugen Block <eblock@xxxxxx> · Wed, 22 Sep 2021 10:02:32 +0000

I understand, thanks for sharing!

Zitat von Dan van der Ster <dan@xxxxxxxxxxxxxx>:

Hi Eugen,

All of our prod clusters are still old school rpm packages managed by
our private puppet manifests. Even our newest pacific pre-prod cluster
is still managed like that.
We have a side project to test and move to cephadm / containers but
that is still a WIP. (Our situation is complicated by the fact that
we'll need to continue puppet managing things like firewall with
cephadm doing the daemon placement).

Cheers, Dan

On Wed, Sep 22, 2021 at 10:32 AM Eugen Block <eblock@xxxxxx> wrote:

Thanks for the summary, Dan!

I'm still hesitating upgrading our production environment from N to O,
your experience sounds reassuring though. I have one question, did you
also switch to cephadm and containerize all daemons? We haven't made a
decision yet, but I guess at some point we'll have to switch anyway,
so we could also just get over it. :-D We'll need to discuss it with
the team...

Thanks,
Eugen

Zitat von Dan van der Ster <dan@xxxxxxxxxxxxxx>:

> Dear friends,
>
> This morning we upgraded our pre-prod cluster from 14.2.22 to 15.2.14,
> successfully, following the procedure at
>  
https://docs.ceph.com/en/latest/releases/octopus/#upgrading-from-mimic-or-nautilus
> It's a 400TB cluster which is 10% used with 72 osds (block=hdd,
> block.db=ssd) and 40M objects.
>
> * The mons upgraded cleanly as expected.
> * One minor surprise was that the mgrs respawned themselves moments
> after the leader restarted into octopus:
>
> 2021-09-21T10:16:38.992219+0200 mon.cephdwight-mon-1633994557 (mon.0)
> 16 : cluster [INF] mon.cephdwight-mon-1633994557 is new leader, mons
>  
cephdwight-mon-1633994557,cephdwight-mon-f7df6839c6,cephdwight-mon-d8788e3256
> in quorum (ranks 0,1,2)
>
> 2021-09-21 10:16:39.046 7fae3caf8700  1 mgr handle_mgr_map respawning
> because set of enabled modules changed!
>
> This didn't create any problems AFAICT.
>
> * The osds performed the expected fsck after restarting. Their logs
> are spammed with things like
>
> 2021-09-21T11:15:23.233+0200 7f85901bd700 -1
> bluestore(/var/lib/ceph/osd/ceph-1) fsck warning:
> #174:1e024a6e:::10009663a55.00000000:head# has omap that is not
> per-pool or pgmeta
>
> but that is fully expected AFAIU. Each osd took just under 10
> minutes to fsck:
>
> 2021-09-21T11:22:27.188+0200 7f85a3a2bf00  1
> bluestore(/var/lib/ceph/osd/ceph-1) _fsck_on_open <<<FINISH>>> with 0
> errors, 197756 warnings, 197756 repaired, 0 remaining in 475.083056
> seconds
>
> For reference, this cluster was created many major releases ago (maybe
> firefly) but osds were probably re-created in luminous.
> The memory usage was quite normal, we didn't suffer any OOMs.
>
> * The active mds restarted into octopus without incident.
>
> In summary it was a very smooth upgrade. After a week of observation
> we'll proceed with more production clusters.
> For our largest S3 cluster with slow hdds, we expect huge fsck
> transactions, so will wait for https://github.com/ceph/ceph/pull/42958
> to be merged before upgrading.
>
> Best Regards, and thanks to all the devs for their work,
>
> Dan
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx