Thanks Sage and Greg for the response. > 2) having a separate switchover point (besides the code upgrade) which > enables all the disk change bits and which doesn't allow you to roll > back. Let me give two examples which prevent us rollback from Giant to Firefly. Example #1: In Giant, there is a new feature flag 'CEPH_FEATURE_ERASURE_CODE_PLUGINS_V2' added/persisted, and monitor would check the persisted list against the list released along with the software version upon starting, it refuse to start if the list mismatch. However, although the feature is added in Giant, it is not being used until we create a new pool with the profile, which is very unlikely to happen. 1) is it possible to persist the new feature bit when the feature is being used (this looks like complicated to implement). 2) When loading the persisted bit, is it possible to check if it is actually used by someone? Example #2: Patch [1] added a new k/v to the PG log which cannot be recognized by old version of binary (PGLog::read_log), as a result, it takes the newly added entry as a pg_log_entry. Is it possible to recognize pg_log_entry with a concrete pattern and just ignore those that the binary cannot recognize? For there two cases, we may be able to erase the newly added entries and then roll back (correct me if I am wrong here), but I think there might be more complicated cases which make the rollback impossible. And accept that risk for upgrading. [1] httpsgithub.com/ceph/ceph/commit/1fe8b846641486cc294fe7e1d2450132c38d2dba Thanks, Guang ---------------------------------------- > Date: Wed, 11 Feb 2015 09:26:14 -0800 > From: sage@xxxxxxxxxxxx > To: greg@xxxxxxxxxxx > CC: yguang11@xxxxxxxxxxx; ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: Upgrade/rollback > > On Wed, 11 Feb 2015, Gregory Farnum wrote: >> On Wed, Feb 11, 2015 at 4:09 AM, GuangYang <yguang11@xxxxxxxxxxx> wrote: >>> Hi ceph-devel, >>> Recently we are trying the upgrade from Firefly to Giant and it goes pretty smoothly, however, the problem is that it does not support rollback and seems like that is by design. For example, there is new feature flag / metadata [1] added in the new version and they are persisted. As a result, the old version of software does not recognize those values and will crash themselves. >>> >>> Ideally we never rollback, but for unknown reasons we couldn't fix in a timely manner, we might want to rollback first. Is that something we will consider to handle? >> >> We are unlikely to ever do this. We've talked about rollback options, >> but allowing rollback means either: >> 1) never adding information to the disk format which matters, >> 2) having a separate switchover point (besides the code upgrade) which >> enables all the disk change bits and which doesn't allow you to roll >> back. >> >> The first option is obviously infeasible. The second one dramatically >> increases the amount of code which can be buggy, increases the testing >> version, and doesn't really solve the problem since you still have a >> hard point of no return. > > The only rollback plan we do have currently is to start testing and > supporting rollback within a major release, e.g. 0.80.8 -> 0.80.7. We > (normally) don't add any incompatible changes within a stable release so > this won't be a major change except that right now none of the downgrade > scenarios are tested. > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html