Hi,
I haven't had any issues upgrading from Luminous to Nautilus in
multiple clusters (mostly RBD usage, but also CephFS), including a
couple of different setups in my lab (RGW, iGW).
Just recently I upgraded a customer cluster with around 220 OSDs, it
was pretty straight forward without any hickups.
The only thing sticking out is your multi-active mds setup which I
didn't have yet. But I don't see any reason why that could be an issue.
Regards,
Eugen
Zitat von Mark Schouten <mark@xxxxxxxx>:
Hi,
We've done our fair share of Ceph cluster upgrades since Hammer, and
have not seen much problems with them. I'm now at the point that I have
to upgrade a rather large cluster running Luminous and I would like to
hear from other users if they have experiences with issues I can expect
so that I can anticipate on them beforehand.
As said, the cluster is running Luminous (12.2.13) and has the following
services active:
services:
mon: 3 daemons, quorum osdnode01,osdnode02,osdnode04
mgr: osdnode01(active), standbys: osdnode02, osdnode03
mds: pmrb-3/3/3 up
{0=osdnode06=up:active,1=osdnode08=up:active,2=osdnode07=up:active},
1 up:standby
osd: 116 osds: 116 up, 116 in;
rgw: 3 daemons active
Of the OSD's, we have 11 SSD's and 105 HDD. The capacity of the cluster
is 1.01PiB.
We have 2 active crush-rules on 18 pools. All pools have a size of 3
there is a total of 5760 pgs.
{
"rule_id": 1,
"rule_name": "hdd-data",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -10,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "ssd-data",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -21,
"item_name": "default~ssd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
rbd -> crush_rule: hdd-data
.rgw.root -> crush_rule: hdd-data
default.rgw.control -> crush_rule: hdd-data
default.rgw.data.root -> crush_rule: ssd-data
default.rgw.gc -> crush_rule: ssd-data
default.rgw.log -> crush_rule: ssd-data
default.rgw.users.uid -> crush_rule: hdd-data
default.rgw.usage -> crush_rule: ssd-data
default.rgw.users.email -> crush_rule: hdd-data
default.rgw.users.keys -> crush_rule: hdd-data
default.rgw.meta -> crush_rule: hdd-data
default.rgw.buckets.index -> crush_rule: ssd-data
default.rgw.buckets.data -> crush_rule: hdd-data
default.rgw.users.swift -> crush_rule: hdd-data
default.rgw.buckets.non-ec -> crush_rule: ssd-data
DB0475 -> crush_rule: hdd-data
cephfs_pmrb_data -> crush_rule: hdd-data
cephfs_pmrb_metadata -> crush_rule: ssd-data
All but four clients are running Luminous, the four are running Jewel
(that needs upgrading before proceeding with this upgrade).
So, normally, I would 'just' upgrade all Ceph packages on the
monitor-nodes and restart mons and then mgrs.
After that, I would upgrade all Ceph packages on the OSD nodes and
restart all the OSD's. Then, after that, the MDSes and RGWs. Restarting
the OSD's will probably take a while.
If anyone has a hint on what I should expect to cause some extra load or
waiting time, that would be great.
Obviously, we have read
https://ceph.com/releases/v14-2-0-nautilus-released/ , but I'm looking
for real world experiences.
Thanks!
--
Mark Schouten | Tuxis B.V.
KvK: 74698818 | http://www.tuxis.nl/
T: +31 318 200208 | info@xxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx