Hi Dan,
This is excellent to hear - we've also been a bit hesitant to upgrade
from Nautilus (which has been working so well for us). One question:
did you/would you consider upgrading straight to Pacific from Nautilus?
Can you share your thoughts that lead you to Octopus first?
Thanks,
Andras
On 9/21/21 06:09, Dan van der Ster wrote:
Dear friends,
This morning we upgraded our pre-prod cluster from 14.2.22 to 15.2.14,
successfully, following the procedure at
https://docs.ceph.com/en/latest/releases/octopus/#upgrading-from-mimic-or-nautilus
It's a 400TB cluster which is 10% used with 72 osds (block=hdd,
block.db=ssd) and 40M objects.
* The mons upgraded cleanly as expected.
* One minor surprise was that the mgrs respawned themselves moments
after the leader restarted into octopus:
2021-09-21T10:16:38.992219+0200 mon.cephdwight-mon-1633994557 (mon.0)
16 : cluster [INF] mon.cephdwight-mon-1633994557 is new leader, mons
cephdwight-mon-1633994557,cephdwight-mon-f7df6839c6,cephdwight-mon-d8788e3256
in quorum (ranks 0,1,2)
2021-09-21 10:16:39.046 7fae3caf8700 1 mgr handle_mgr_map respawning
because set of enabled modules changed!
This didn't create any problems AFAICT.
* The osds performed the expected fsck after restarting. Their logs
are spammed with things like
2021-09-21T11:15:23.233+0200 7f85901bd700 -1
bluestore(/var/lib/ceph/osd/ceph-1) fsck warning:
#174:1e024a6e:::10009663a55.00000000:head# has omap that is not
per-pool or pgmeta
but that is fully expected AFAIU. Each osd took just under 10 minutes to fsck:
2021-09-21T11:22:27.188+0200 7f85a3a2bf00 1
bluestore(/var/lib/ceph/osd/ceph-1) _fsck_on_open <<<FINISH>>> with 0
errors, 197756 warnings, 197756 repaired, 0 remaining in 475.083056
seconds
For reference, this cluster was created many major releases ago (maybe
firefly) but osds were probably re-created in luminous.
The memory usage was quite normal, we didn't suffer any OOMs.
* The active mds restarted into octopus without incident.
In summary it was a very smooth upgrade. After a week of observation
we'll proceed with more production clusters.
For our largest S3 cluster with slow hdds, we expect huge fsck
transactions, so will wait for https://github.com/ceph/ceph/pull/42958
to be merged before upgrading.
Best Regards, and thanks to all the devs for their work,
Dan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx