Upgrading multi-site RGW to Luminous

David Turner <drakonstein@xxxxxxxxx> · Mon, 29 Jan 2018 16:58:10 +0000

Apparently RGW daemons running 12.2.2 cannot sync data from RGW daemons running anything other than Luminous.  This means that if you run multisite and you don't upgrade both sites at the same time, then you have broken replication.  There is a fix for this scheduled for 12.2.3 (http://tracker.ceph.com/issues/22183).

As most people running Multi-site are probably doing so with the intention of routing traffic around a zone to do maintenance, like upgrades, that means that the upgrade to Luminous is impossible and leaves you in a degraded state.  The assumed process for an upgrade would be to have 2+ Jewel sites, route traffic away from one of them and upgrade it to Luminous, now the Luminous site will need to catch up from the Jewel site... which it can't.  The only existing options to upgrade multi-site RGW from Jewel to Luminous would be to do it simultaneously on all sites or set all traffic to go to the site you're performing the upgrade on and set the Jewel site to read-only.  Nothing of any of this is documented in the release notes.

On top of this, I ran into a problem where all of my index pools had dozens of scrub errors after the upgrade from Jewel to Luminous.  This wasn't only for multi-site realms as a local only realm also had these scrub errors.

We pushed up the upgrade because of a memory leak in RGW that was fixed in 12.2.2 that causes our RGW daemons to OOM restart about every 30 minutes while in multi-site (http://tracker.ceph.com/issues/19446).  As well as Bluestore fixing the problem with high object count pools and filestore subfolder splitting crippling clusters.  Between the 2 of those, we had constant maintenance trying to keep the clusters from running with persistent blocked requests.

Does anyone have any suggestions for how to move forward now?  I don't trust upgrading our remaining Jewel site to Luminous because of the unexplained scrub errors (since RGW multi-site is busted, the daemons are no longer OOM restarting on the Jewel site).  Option 2 we would have to stop all writes to our remaining active Jewel site, manually sync the data it has to the Luminous site, and ultimately direct traffic to Luminous while setting the Jewel site to read-only.  I suppose the 3rd option is the one we'll have to go with, which is to wait until 12.2.3 and hopefully fix multi-site sync before moving forward with anything else.  Unfortunately we lose the redundancy of the second site, but at least we don't have to deal with the RGW daemons restarting 2x/hr.

There is also a typo in the Redhat docs and Ceph docs for RGW multi-site disaster recovery and failover (http://docs.ceph.com/docs/luminous/radosgw/multisite/#failover-and-disaster-recovery, https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/object_gateway_guide_for_red_hat_enterprise_linux/multi_site#failover_and_disaster_recovery).  radosgw-admin --read-_only_=False is parsed without the =False and sets the target zone as read-only.  This is tested on both Jewel and Luminous.  The only way I could find to fix this was to download the zonegroup.json, edit it manually, and set it back to the realm.

Anyway, here's another cautionary tale of upgrading to Luminous without regression testing your environment and the upgrade process.  You can't assume anyone else tested it.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com