Mark, My main note was to make sure and NOT enable msgr2 until all OSDs are upgraded to Nautilus. I made that mistake early in the lab, and had to work hard to get it back together. Otherwise, pretty smooth process. -- Alex Gorbachev ISS/Storcium On Thu, Apr 29, 2021 at 4:58 AM Mark Schouten <mark@xxxxxxxx> wrote: > Hi, > > We've done our fair share of Ceph cluster upgrades since Hammer, and > have not seen much problems with them. I'm now at the point that I have > to upgrade a rather large cluster running Luminous and I would like to > hear from other users if they have experiences with issues I can expect > so that I can anticipate on them beforehand. > > As said, the cluster is running Luminous (12.2.13) and has the following > services active: > services: > mon: 3 daemons, quorum osdnode01,osdnode02,osdnode04 > mgr: osdnode01(active), standbys: osdnode02, osdnode03 > mds: pmrb-3/3/3 up > {0=osdnode06=up:active,1=osdnode08=up:active,2=osdnode07=up:active}, 1 > up:standby > osd: 116 osds: 116 up, 116 in; > rgw: 3 daemons active > > > Of the OSD's, we have 11 SSD's and 105 HDD. The capacity of the cluster > is 1.01PiB. > > We have 2 active crush-rules on 18 pools. All pools have a size of 3 there > is a total of 5760 pgs. > { > "rule_id": 1, > "rule_name": "hdd-data", > "ruleset": 1, > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { > "op": "take", > "item": -10, > "item_name": "default~hdd" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > }, > { > "rule_id": 2, > "rule_name": "ssd-data", > "ruleset": 2, > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { > "op": "take", > "item": -21, > "item_name": "default~ssd" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > } > > rbd -> crush_rule: hdd-data > .rgw.root -> crush_rule: hdd-data > default.rgw.control -> crush_rule: hdd-data > default.rgw.data.root -> crush_rule: ssd-data > default.rgw.gc -> crush_rule: ssd-data > default.rgw.log -> crush_rule: ssd-data > default.rgw.users.uid -> crush_rule: hdd-data > default.rgw.usage -> crush_rule: ssd-data > default.rgw.users.email -> crush_rule: hdd-data > default.rgw.users.keys -> crush_rule: hdd-data > default.rgw.meta -> crush_rule: hdd-data > default.rgw.buckets.index -> crush_rule: ssd-data > default.rgw.buckets.data -> crush_rule: hdd-data > default.rgw.users.swift -> crush_rule: hdd-data > default.rgw.buckets.non-ec -> crush_rule: ssd-data > DB0475 -> crush_rule: hdd-data > cephfs_pmrb_data -> crush_rule: hdd-data > cephfs_pmrb_metadata -> crush_rule: ssd-data > > > All but four clients are running Luminous, the four are running Jewel > (that needs upgrading before proceeding with this upgrade). > > So, normally, I would 'just' upgrade all Ceph packages on the > monitor-nodes and restart mons and then mgrs. > > After that, I would upgrade all Ceph packages on the OSD nodes and > restart all the OSD's. Then, after that, the MDSes and RGWs. Restarting > the OSD's will probably take a while. > > If anyone has a hint on what I should expect to cause some extra load or > waiting time, that would be great. > > Obviously, we have read > https://ceph.com/releases/v14-2-0-nautilus-released/ , but I'm looking > for real world experiences. > > Thanks! > > > -- > Mark Schouten | Tuxis B.V. > KvK: 74698818 | http://www.tuxis.nl/ > T: +31 318 200208 | info@xxxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx