Re: Successful Upgrade from 14.2.22 to 15.2.14

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the summary, Dan!

I'm still hesitating upgrading our production environment from N to O, your experience sounds reassuring though. I have one question, did you also switch to cephadm and containerize all daemons? We haven't made a decision yet, but I guess at some point we'll have to switch anyway, so we could also just get over it. :-D We'll need to discuss it with the team...

Thanks,
Eugen


Zitat von Dan van der Ster <dan@xxxxxxxxxxxxxx>:

Dear friends,

This morning we upgraded our pre-prod cluster from 14.2.22 to 15.2.14,
successfully, following the procedure at
https://docs.ceph.com/en/latest/releases/octopus/#upgrading-from-mimic-or-nautilus
It's a 400TB cluster which is 10% used with 72 osds (block=hdd,
block.db=ssd) and 40M objects.

* The mons upgraded cleanly as expected.
* One minor surprise was that the mgrs respawned themselves moments
after the leader restarted into octopus:

2021-09-21T10:16:38.992219+0200 mon.cephdwight-mon-1633994557 (mon.0)
16 : cluster [INF] mon.cephdwight-mon-1633994557 is new leader, mons
cephdwight-mon-1633994557,cephdwight-mon-f7df6839c6,cephdwight-mon-d8788e3256
in quorum (ranks 0,1,2)

2021-09-21 10:16:39.046 7fae3caf8700  1 mgr handle_mgr_map respawning
because set of enabled modules changed!

This didn't create any problems AFAICT.

* The osds performed the expected fsck after restarting. Their logs
are spammed with things like

2021-09-21T11:15:23.233+0200 7f85901bd700 -1
bluestore(/var/lib/ceph/osd/ceph-1) fsck warning:
#174:1e024a6e:::10009663a55.00000000:head# has omap that is not
per-pool or pgmeta

but that is fully expected AFAIU. Each osd took just under 10 minutes to fsck:

2021-09-21T11:22:27.188+0200 7f85a3a2bf00  1
bluestore(/var/lib/ceph/osd/ceph-1) _fsck_on_open <<<FINISH>>> with 0
errors, 197756 warnings, 197756 repaired, 0 remaining in 475.083056
seconds

For reference, this cluster was created many major releases ago (maybe
firefly) but osds were probably re-created in luminous.
The memory usage was quite normal, we didn't suffer any OOMs.

* The active mds restarted into octopus without incident.

In summary it was a very smooth upgrade. After a week of observation
we'll proceed with more production clusters.
For our largest S3 cluster with slow hdds, we expect huge fsck
transactions, so will wait for https://github.com/ceph/ceph/pull/42958
to be merged before upgrading.

Best Regards, and thanks to all the devs for their work,

Dan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux