Hi Andras, I'm not aware of any showstoppers to move directly to pacific. Indeed we already run pacific on a new cluster we built for our users to try cephfs snapshots at scale. That cluster was created with octopus a few months ago then upgraded to pacific at 16.2.4 to take advantage of the stray dentry splitting. Why octopus and not pacific directly for the existing bulk of our prod clusters? we're just being conservative, especially for what concerns the fsck omap upgrade on all the osds. Since it went well for this cluster, I expect it will similarly go well for the other rbd and cephfs clusters. We'll tread more carefully for the S3 clusters, but with the PR mentioned earlier I expect it to go well. My expectation is that we'll only run octopus for a short while before we move to pacific in one of the next point releases there. Before octopus we usually haven't moved our most critical clusters to the next major release until around ~.8 -- it's usually by then that all major issues have been flushed out, AFAICT. Cheers, Dan On Wed, Sep 22, 2021 at 11:19 AM Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > > Hi Dan, > > This is excellent to hear - we've also been a bit hesitant to upgrade > from Nautilus (which has been working so well for us). One question: > did you/would you consider upgrading straight to Pacific from Nautilus? > Can you share your thoughts that lead you to Octopus first? > > Thanks, > > Andras > > > On 9/21/21 06:09, Dan van der Ster wrote: > > Dear friends, > > > > This morning we upgraded our pre-prod cluster from 14.2.22 to 15.2.14, > > successfully, following the procedure at > > https://docs.ceph.com/en/latest/releases/octopus/#upgrading-from-mimic-or-nautilus > > It's a 400TB cluster which is 10% used with 72 osds (block=hdd, > > block.db=ssd) and 40M objects. > > > > * The mons upgraded cleanly as expected. > > * One minor surprise was that the mgrs respawned themselves moments > > after the leader restarted into octopus: > > > > 2021-09-21T10:16:38.992219+0200 mon.cephdwight-mon-1633994557 (mon.0) > > 16 : cluster [INF] mon.cephdwight-mon-1633994557 is new leader, mons > > cephdwight-mon-1633994557,cephdwight-mon-f7df6839c6,cephdwight-mon-d8788e3256 > > in quorum (ranks 0,1,2) > > > > 2021-09-21 10:16:39.046 7fae3caf8700 1 mgr handle_mgr_map respawning > > because set of enabled modules changed! > > > > This didn't create any problems AFAICT. > > > > * The osds performed the expected fsck after restarting. Their logs > > are spammed with things like > > > > 2021-09-21T11:15:23.233+0200 7f85901bd700 -1 > > bluestore(/var/lib/ceph/osd/ceph-1) fsck warning: > > #174:1e024a6e:::10009663a55.00000000:head# has omap that is not > > per-pool or pgmeta > > > > but that is fully expected AFAIU. Each osd took just under 10 minutes to fsck: > > > > 2021-09-21T11:22:27.188+0200 7f85a3a2bf00 1 > > bluestore(/var/lib/ceph/osd/ceph-1) _fsck_on_open <<<FINISH>>> with 0 > > errors, 197756 warnings, 197756 repaired, 0 remaining in 475.083056 > > seconds > > > > For reference, this cluster was created many major releases ago (maybe > > firefly) but osds were probably re-created in luminous. > > The memory usage was quite normal, we didn't suffer any OOMs. > > > > * The active mds restarted into octopus without incident. > > > > In summary it was a very smooth upgrade. After a week of observation > > we'll proceed with more production clusters. > > For our largest S3 cluster with slow hdds, we expect huge fsck > > transactions, so will wait for https://github.com/ceph/ceph/pull/42958 > > to be merged before upgrading. > > > > Best Regards, and thanks to all the devs for their work, > > > > Dan > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx