Re: Recovery question

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 29 Jul 2015 19:55:50 +0000

This sounds odd. Can you create a ticket in the tracker with all the details you can remember or reconstruct?
-Greg

On Wed, Jul 29, 2015 at 8:34 PM Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx> wrote:
I recently migrated 240 OSDs to new servers this way in a single cluster, and it worked great. There are two additional items I would note based on my experience though.

First, if you're using dmcrypt then of course you need to copy the dmcrypt keys for the OSDs to the new host(s). I had to do this in my case, but it was very straightforward.

Second was an issue I didn't expect, probably just because of my ignorance. I was not able to migrate existing OSDs from different failure domains into a new, single failure domain without waiting for full recovery to HEALTH_OK in between. The very first server I put OSD disks from two different failure domains into had issues. The OSDs came up and in just fine, but immediately started flapping and failed to make progress toward recovery. I removed the disks from one failure domain and left the others, and recovery progressed as expected. As soon as I saw HEALTH_OK I re-migrated the OSDs from the other failure domain and again the cluster recovered as expected. Proceeding via this method allowed me to migrate all 240 OSDs without any further problems. I was also able to migrate as many OSDs as I wanted to simultaneously as long as I didn't mix OSDs from different, old failure domains in a new failure domain without recovering in between. I understand mixing failure domains li

 ke this is risky, but I sort of expected it to work anyway. Maybe it was better in the end that Ceph forced me to do it more safely.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation

380 Data Drive Suite 300 | Draper | Utah | 84020

Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any dissemination or copying of this message is prohibited.

If you received this message erroneously, please notify the sender and delete it, together with any attachments.

-----Original Message-----

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Peter Hinman

Sent: Wednesday, July 29, 2015 12:58 PM

To: Robert LeBlanc <robert@xxxxxxxxxxxxx>

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re:  Recovery question

Thanks for the guidance.  I'm working on building a valid ceph.conf right now.  I'm not familiar with the osd-bootstrap key. Is that the standard filename for it?  Is it the keyring that is stored on the osd?

I'll see if the logs turn up anything I can decipher after I rebuild the ceph.conf file.

--

Peter Hinman

On 7/29/2015 12:49 PM, Robert LeBlanc wrote:

> -----BEGIN PGP SIGNED MESSAGE-----

> Hash: SHA256

>

> Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it

> should use udev to start he OSDs. In that case, a new host that has

> the correct ceph.conf and osd-bootstrap key should be able to bring up

> the OSDs into the cluster automatically. Just make sure you have the

> correct journal in the same host with the matching OSD disk, udev

> should do the magic.

>

> The OSD logs are your friend if they don't start properly.

> - ----------------

> Robert LeBlanc

> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

>

>

> On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:

>> I've got a situation that seems on the surface like it should be

>> recoverable, but I'm struggling to understand how to do it.

>>

>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After

>> multiple hardware failures, I pulled the 3 osd disks and 3 journal

>> ssds and am attempting to bring them back up again on new hardware in a new cluster.

>> I see plenty of documentation on how to zap and initialize and add "new"

>> osds, but I don't see anything on rebuilding with existing osd disks.

>>

>> Could somebody provide guidance on how to do this?  I'm running 94.2

>> on all machines.

>>

>> Thanks,

>>

>> --

>> Peter Hinman

>>

>>

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> -----BEGIN PGP SIGNATURE-----

> Version: Mailvelope v0.13.1

> Comment: https://www.mailvelope.com

>

> wsFcBAEBCAAQBQJVuSA/CRDmVDuy+mK58QAAfGAQAMq62W7QvCAo2RSDWLli

> 13AJTpAWhk+ilBwcmxFr/gP/Aa9hMN5bV8idDqI56YWBjGO2WPQIUT8CXH5v

> ocBUZZJ0X08gOgHqFQ8x3rSSe6QINy1bQONMql3Jgpy8He/ctLnXROhNT9SU

> l30CI4qKwG48AZU5E4PoWgwQmdbFv0WIuFwCzPOVIU6GvO0umirerw3C7tZQ

> I34+OINURzCjKzLY/OEF4hRvRq3PV0KZAoolQTeBJtEdlyNgAQ/bHOgpfJ/h

> diGwQZyhSzqTvFYOEHWUuh5ZnhZAMNtaLBulwreUEKoI0IcXGxpH6KsC7ag4

> KJ1kD8U0I18eP4iyTOIXg+DxafUU4wrITlKdomW12XqmlHadi2vYYBCqataI

> uc4KeXHP4/SrA1qoEDtXroAV2iuV6UUNIwsY4HPBJ/CNKXFU5QSdGOey3Kjs

> Mz2zuCpMkTf6fj8B4XJfenfFulRVJwrKJml7JebPFpLTRPFMbsuZ5htUMASn

> UWyCA9IfxLYsC5tPlii79Kkb93mvN3cCdvchkH2CQ38jxkVRZRUqeJlzvtVp

> 2mwinvqPD0irTvr+LvmlKOdtvFSOKJM0XmRSVk1LgLlpoyIZ9BqI02ul01fE

> 7nZ892/17zdv0Nguxr8F8bps0jA7NLFpgRhEsakdmTVTJQLMwSv7z6c9fdP0

> 7AWQ

> =VJV0

> -----END PGP SIGNATURE-----

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com