Re: [Ceph Upgrade] - Rollback Support during Upgrade failure

Lokendra Rathour <lokendrarathour@xxxxxxxxx> · Thu, 9 Sep 2021 14:38:34 +0530

Hi Matthew,
*Thanks for the update.*

*For the Part:*
[my Query]
> *Other Query:*
> What if the complete cluster goes down, i.e mon crashes another daemon
> crashes, can we try to restore the data in OSDs, maybe by reusing the
> OSD's in another or new Ceph Cluster or something to save the data.

*[My added query]*
In this same situation, we are not sure about the OSD daemon service as
well, i.e it may be the case that all the services including
OSD/MON/MGR/others are completely dead and can't be accessed.
So in this case, we have data still on the disks, so is there a way to
recover such data in a completely new setup of Ceph or something?

Hope I am not confusing you more.

Best Regards,
Lokendra

On Wed, Sep 8, 2021 at 7:03 PM Matthew Vernon <mvernon@xxxxxxxxxxxxx> wrote:

> Hi,
>
> On 06/09/2021 08:37, Lokendra Rathour wrote:
> > Thanks, Mathew for the Update.
> > The upgrade got failed for some random wired reasons, Checking further
> > Ceph's status shows that "Ceph health is OK" and times it gives certain
> > warnings but I think that is ok.
>
> OK...
>
> > but what if we see the Version mismatch between the daemons, i.e few
> > services have upgraded and the remaining could not be upgraded. So in
> > this state, we do two things:
> >
> >   * Retrying the upgrade activity (to Pacific) - it might work this time.
> >   * Going back to the older Version (Octopus) - is this possible and if
> >     yes then how?
>
> In general downgrades are not supported, so I think continuing with the
> upgrade is the best answer.
>
> > *Other Query:*
> > What if the complete cluster goes down, i.e mon crashes other daemon
> > crashes, can we try to restore the data in OSDs, maybe by reusing the
> > OSD's in another or new Ceph Cluster or something to save the data.
>
> You will generally have more than 1 mon (typically 3, some people have
> 5), and as long as a quorum remains, you will still have a working
> cluster. If you somehow manage to break all your mons, there is an
> emergency procedure for recreating the mon map from an OSD -
>
>
> https://docs.ceph.com/en/pacific/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
>
> ...but you don't want to end up in that situation!
>
> RADOS typically splits objects across multiple placement groups (and
> thus across multiple OSDs); while there are tools to extract data from
> OSDs (e.g. https://docs.ceph.com/en/latest/man/8/ceph-objectstore-tool/
> ), you won't get complete objects this way. Instead, the advice would be
> to try and get enough mons back up to get your cluster at least to a
> read-only state and then attempt recovery that way.
>
> HTH,
>
> Matthew
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
~ Lokendra
www.inertiaspeaks.com
www.inertiagroups.com
skype: lokendrarathour
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx