Re: Successful Upgrade from 14.2.22 to 15.2.14

Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> · Wed, 22 Sep 2021 05:19:47 -0400

Hi Dan,

This is excellent to hear - we've also been a bit hesitant to upgrade 
from Nautilus (which has been working so well for us).  One question: 
did you/would you consider upgrading straight to Pacific from Nautilus?  
Can you share your thoughts that lead you to Octopus first?

Thanks,

Andras

On 9/21/21 06:09, Dan van der Ster wrote:
Dear friends,

This morning we upgraded our pre-prod cluster from 14.2.22 to 15.2.14,
successfully, following the procedure at
https://docs.ceph.com/en/latest/releases/octopus/#upgrading-from-mimic-or-nautilus
It's a 400TB cluster which is 10% used with 72 osds (block=hdd,
block.db=ssd) and 40M objects.

* The mons upgraded cleanly as expected.
* One minor surprise was that the mgrs respawned themselves moments
after the leader restarted into octopus:

2021-09-21T10:16:38.992219+0200 mon.cephdwight-mon-1633994557 (mon.0)
16 : cluster [INF] mon.cephdwight-mon-1633994557 is new leader, mons
cephdwight-mon-1633994557,cephdwight-mon-f7df6839c6,cephdwight-mon-d8788e3256
in quorum (ranks 0,1,2)

2021-09-21 10:16:39.046 7fae3caf8700  1 mgr handle_mgr_map respawning
because set of enabled modules changed!

This didn't create any problems AFAICT.

* The osds performed the expected fsck after restarting. Their logs
are spammed with things like

2021-09-21T11:15:23.233+0200 7f85901bd700 -1
bluestore(/var/lib/ceph/osd/ceph-1) fsck warning:
#174:1e024a6e:::10009663a55.00000000:head# has omap that is not
per-pool or pgmeta

but that is fully expected AFAIU. Each osd took just under 10 minutes to fsck:

2021-09-21T11:22:27.188+0200 7f85a3a2bf00  1
bluestore(/var/lib/ceph/osd/ceph-1) _fsck_on_open <<<FINISH>>> with 0
errors, 197756 warnings, 197756 repaired, 0 remaining in 475.083056
seconds

For reference, this cluster was created many major releases ago (maybe
firefly) but osds were probably re-created in luminous.
The memory usage was quite normal, we didn't suffer any OOMs.

* The active mds restarted into octopus without incident.

In summary it was a very smooth upgrade. After a week of observation
we'll proceed with more production clusters.
For our largest S3 cluster with slow hdds, we expect huge fsck
transactions, so will wait for https://github.com/ceph/ceph/pull/42958
to be merged before upgrading.

Best Regards, and thanks to all the devs for their work,

Dan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx