Re: Recovery question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I'm glad you were able to recover. I'm sure you learned a lot about
Ceph through the exercise (always seems to be the case for me with
things). I'll look forward to your report so that we can include it in
our operations manual, just in case.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Jul 30, 2015 at 12:41 PM, Peter Hinman  wrote:
> For the record, I have been able to recover.  Thank you very much for the
> guidance.
>
> I hate searching the web and finding only partial information on threads
> like this, so I'm going to document and post what I've learned as best I can
> in hopes that it will help someone else out in the future.
>
> --
> Peter Hinman
>
> On 7/29/2015 5:15 PM, Robert LeBlanc wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> If you had multiple monitors, you should recover if possible more than
>> 50% of them (they will need to form a quorum). If you can't, it is
>> messy but, you can manually remove enough monitors to start a quorum.
>>  From /etc/ceph/ you will want the keyring and the ceph.conf at a
>> minimim. The keys for the monitor I think are in the store.db which
>> will let the monitors start, but the keyring has the admin key which
>> lets you manage the cluster once you get it up. rbdmap is not needed
>> for recovery (only automatically mounting RBDs at boot time), we can
>> deal with that later if needed.
>> - ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jul 29, 2015 at 4:40 PM, Peter Hinman  wrote:
>>>
>>> Ok - that is encouraging.  I've believe I've got data from a previous
>>> monitor. I see files in a store.db dated yesterday, with a
>>> MANIFEST-########
>>> file that is significantly greater than the MANIFEST-000007 file listed
>>> for
>>> the current monitors.
>>>
>>> I've actually found data for two previous monitors.  Any idea which one I
>>> should select? The one with the highest manifest number? The most recent
>>> time stamp?
>>>
>>> What files should I be looking for in /etc/conf?  Just the keyring and
>>> rbdmap files?  How important is it to use the same keyring file?
>>>
>>> --
>>> Peter Hinman
>>>
>>>
>>> On 7/29/2015 3:47 PM, Robert LeBlanc wrote:
>>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA256
>>>>
>>>> The default is /var/lib/ceph/mon/- (/var/lib/ceph/mon/ceph-mon1 for
>>>> me). You will also need the information from /etc/ceph/ to reconstruct
>>>> the data. I *think* you should be able to just copy this to a new box
>>>> with the same name and IP address and start it up.
>>>>
>>>> I haven't actually done this, so there still may be some bumps.
>>>> - ----------------
>>>> Robert LeBlanc
>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>
>>>>
>>>> On Wed, Jul 29, 2015 at 3:44 PM, Peter Hinman  wrote:
>>>>>
>>>>> Thanks Robert -
>>>>>
>>>>> Where would that monitor data (database) be found?
>>>>>
>>>>> --
>>>>> Peter Hinman
>>>>>
>>>>>
>>>>> On 7/29/2015 3:39 PM, Robert LeBlanc wrote:
>>>>>>
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA256
>>>>>>
>>>>>> If you built new monitors, this will not work. You would have to
>>>>>> recover the monitor data (database) from at least one monitor and
>>>>>> rebuild the monitor. The new monitors would not have any information
>>>>>> about pools, OSDs, PGs, etc to allow an OSD to be rejoined.
>>>>>> - ----------------
>>>>>> Robert LeBlanc
>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 29, 2015 at 2:46 PM, Peter Hinman  wrote:
>>>>>>>
>>>>>>> Hi Greg -
>>>>>>>
>>>>>>> So at the moment, I seem to be trying to resolve a permission error.
>>>>>>>
>>>>>>>     === osd.3 ===
>>>>>>>     Mounting xfs on stor-2:/var/lib/ceph/osd/ceph-3
>>>>>>>     2015-07-29 13:35:08.809536 7f0a0262e700  0 librados: osd.3
>>>>>>> authentication
>>>>>>> error (1) Operation not permitted
>>>>>>>     Error connecting to cluster: PermissionError
>>>>>>>     failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
>>>>>>> --name=osd.3
>>>>>>> --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move
>>>>>>> --
>>>>>>> 3
>>>>>>> 3.64 host=stor-2 root=default'
>>>>>>>     ceph-disk: Error: ceph osd start failed: Command
>>>>>>> '['/usr/sbin/service',
>>>>>>> 'ceph', '--cluster', 'ceph', 'start', 'osd.3']' returned non-zero
>>>>>>> exit
>>>>>>> status 1
>>>>>>>     ceph-disk: Error: One or more partitions failed to activate
>>>>>>>
>>>>>>>
>>>>>>> Is there a way to identify the cause of this PermissionError?  I've
>>>>>>> copied
>>>>>>> the client.bootstrap-osd key from the output of ceph auth list, and
>>>>>>> pasted
>>>>>>> it into /var/lib/ceph/bootstrap-osd/ceph.keyring, but that has not
>>>>>>> resolve
>>>>>>> the error.
>>>>>>>
>>>>>>> But it sounds like you are saying that even once I get this resolved,
>>>>>>> I
>>>>>>> have
>>>>>>> no hope of recovering the data?
>>>>>>>
>>>>>>> --
>>>>>>> Peter Hinman
>>>>>>>
>>>>>>> On 7/29/2015 1:57 PM, Gregory Farnum wrote:
>>>>>>>
>>>>>>> This sounds like you're trying to reconstruct a cluster after
>>>>>>> destroying
>>>>>>> the
>>>>>>> monitors. That is...not going to work well. The monitors define the
>>>>>>> cluster
>>>>>>> and you can't move OSDs into different clusters. We have ideas for
>>>>>>> how
>>>>>>> to
>>>>>>> reconstruct monitors and it can be done manually with a lot of
>>>>>>> hassle,
>>>>>>> but
>>>>>>> the process isn't written down and there aren't really fools I help
>>>>>>> with
>>>>>>> it.
>>>>>>> :/
>>>>>>> -Greg
>>>>>>>
>>>>>>> On Wed, Jul 29, 2015 at 5:48 PM Peter Hinman  wrote:
>>>>>>>>
>>>>>>>> I've got a situation that seems on the surface like it should be
>>>>>>>> recoverable, but I'm struggling to understand how to do it.
>>>>>>>>
>>>>>>>> I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds.
>>>>>>>> After
>>>>>>>> multiple hardware failures, I pulled the 3 osd disks and 3 journal
>>>>>>>> ssds
>>>>>>>> and am attempting to bring them back up again on new hardware in a
>>>>>>>> new
>>>>>>>> cluster.  I see plenty of documentation on how to zap and initialize
>>>>>>>> and
>>>>>>>> add "new" osds, but I don't see anything on rebuilding with existing
>>>>>>>> osd
>>>>>>>> disks.
>>>>>>>>
>>>>>>>> Could somebody provide guidance on how to do this?  I'm running 94.2
>>>>>>>> on
>>>>>>>> all machines.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> --
>>>>>>>> Peter Hinman
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>> -----BEGIN PGP SIGNATURE-----
>>>>>> Version: Mailvelope v0.13.1
>>>>>> Comment: https://www.mailvelope.com
>>>>>>
>>>>>> wsFcBAEBCAAQBQJVuUfwCRDmVDuy+mK58QAASJIP/0kEBx+h7LZfpQkgUvPG
>>>>>> hKzIlSzbWIkig9O5cYzXKh03jPFz1hj38YVQ+cdYuRA1VhrNdTkwyNnzVDFk
>>>>>> 7R98PUF4eljNNnSdQ0nIIVCS8rtGWfSUU4ECo1/4Gm8ebIMmY/g6umE87oqy
>>>>>> fBmXW9luFZ3HQyoaqfALKWsesNJ9EJT/EgMH3+XisJZYPtpEbVDr0DiV2sbt
>>>>>> st1xtsQwkKOGAOr+7sGe7g9dED7zCERLWsNOpeHkeJaArbKDzGY1abpoiyUt
>>>>>> BQ5lCHGKZCBqXINaVTmwPGMTdKpED5eBxIXQ+QeEXwBONQuei4zkDz8TWRKO
>>>>>> zaNcogcaQilSg3KyjyHzovPzVoS0OGLmEK1FVtveUfMPfMQ9XXyGnhWiZ6u7
>>>>>> +grlQoe4E5AZTqEMtCzKyrqldWdzL8A+S9ZidtvSi1dCZpJutEkFbI/m8A5j
>>>>>> dA6Q7zijNJDPVMMsXXA08z6Pu7611mShXjW0fLu871++JsE/eS8GCfc9Cgyu
>>>>>> aUgcSaWCuRVa2laXak3BI+44AexsU3ZKyveDeuFdm7y3F+DS5FKZK2V8OfJn
>>>>>> /mbolRFyGCaBEj83FQJGCBrsSOzYDhas8aEDa4W9kKLbKeBaeRUE0mXQYfvu
>>>>>> 12lZxpzn0UasrH/mcgu8ij9ElLN5Fq0wSp1SNKbg/RczcYVt/DjjGbCRDTgO
>>>>>> b23I
>>>>>> =1NQh
>>>>>> -----END PGP SIGNATURE-----
>>>>>
>>>>>
>>>>>
>>>> -----BEGIN PGP SIGNATURE-----
>>>> Version: Mailvelope v0.13.1
>>>> Comment: https://www.mailvelope.com
>>>>
>>>> wsFcBAEBCAAQBQJVuUnoCRDmVDuy+mK58QAAM3MP/jcJUwKHRUjI47y+S1St
>>>> 5Tmkpmob0CdaplzdTvTYF9nc0U6GxhEwTxs/e6CuJ8raKKCNdV2Bu303kQQz
>>>> t9xAJXbzUe4w5/f3WwlM91R3JavcqRVKyiVptzRf3EuHMCWUmLhWKlOK6WNF
>>>> fcO3yRRyn5gkGHR2w7MAAVvABSyZsrLSvCU8Ckila3/cdZNRkOUCfiwSfYlL
>>>> TXd1fsOVP629ltvl5xFQ2u4Tmx+hHq+oxx9PesWz8eKbXwM0Cbq84WfwojlQ
>>>> GHL9DZlCOSGwsinEHQycODGmkL7QSfEBYEZYbMgtDmUbUOXLfUifvKvuk7yD
>>>> I1QxzjhWLxa0k4k/8YzP8055euFTIzZ3E4g3jG50KzvxIaUhHuiGT0CXEtEi
>>>> vEXg1iGtY3lhNjoMNMjQcSbi/PlbBL+08Y6y/VboXu4Tk6jm62g1gJvBHhUE
>>>> OuE3Yiw1wQZI8LjGpCHQOt/ck3aFotsQI73hZF2DwANFeEKKcZvi+Pj0QsdC
>>>> XVTCjI4qM4k36HjJG3tT/94Y6PbRVyeK6bLCa9klmx/u4ToWUjgD5e7Nc6Pv
>>>> OcIkUfG+U6v5TjMAsGzEu1ZjQWNdpEGPoRr+cqwBYNlZu8QUacIAZ86jIyPe
>>>> 77lrKrRbiYsdaYzZ48jrBXM+RiJY/sHgWnAr/Y12ysGbel3LVqxp8gGfI6zg
>>>> beQf
>>>> =gokk
>>>> -----END PGP SIGNATURE-----
>>>
>>>
>>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: Mailvelope v0.13.1
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJVuV6VCRDmVDuy+mK58QAAa48QAIf3BuhUZ0p4zyKGOzeC
>> 3gyYhH8RuS8ucb06aqyIDvvd9qecSkEXBlZdCt6qlykWLnI5ZgWt3mMQOuPl
>> rHM8KarfDesiKJIb6tITm3L68z1Vr0XsccWIv+MndWyi6LKX03KCFJkJq0z4
>> H/fev0M8mchIAYFdvZpMX3ppB4r69XP9ExoGN4fteJwKN0iN8dPTHNFylcdw
>> JjxMuL8z1SmjS83hefKJcTgH/ydfn88xS0k0mnYY7fCHCwcCEf/aLOrSPd/a
>> vPoKHJSN2BOEer9YkPcK8zePeTlISk8JO/S2J1nuvVUwbgzRY4Ivnw+JqFt3
>> 2Q8vw6ACA5zfHi5vmtm+B3t4uOxNxqIxJLK0ToMqg9K44AaWXn4ypbPJsIjy
>> BetE6zQru1kKiHhoAeVFK+jn3ryKOaB/vkKtexlUdsodTYddS90YfPp5PTJC
>> IkpRh3A2G1EpSSYE3EEo1JTs/J2FeMUDe8DlIMMBZ361ZqaEEcfs3TALduN7
>> Jg06Xsru1mmTGVorkikmXc9yeGyELV1gDyIOGrLtLZjTzWXX8QGefRIsTRGB
>> RSb9+wipWGieu4jH5tf1uETIx9d/13YCu1V/hl3Anei9TGfqt08v/UlcesdV
>> TIYyQYXzG5RoZY3AkNw7u3DFWVKVWj2mez4NhEgqjeXUc2VjmPWqX1O0EHkO
>> j0d4
>> =grn/
>> -----END PGP SIGNATURE-----
>
>
>

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVunOyCRDmVDuy+mK58QAAYyQQAMXEn6agCHaLURerJzto
RBdORmPtkoONpxm9rNngy6fP0Imn3Ga/ZGYWE0SoNbWxwYXcVL+B3aq6msk4
cnOBMiQQSs1NExFyUnT+4ieTosQC9x+06YvRv/6ioh953+7QOmRb6O37Efnl
hMAi6OTNeJx017k19F6nUXDkC3RjBz6Ma/Cy3hCedCFwQ0uxXTCOKlEXxkq+
JGlywUakX03pKoUP/OiMNCkKwFITs/zqDXSCTWEcYm0JQSvjgw636P9s019J
A7YmP5j28CnwIxxvf6uMNEKlHPEvRJOjAXmY4XW63l3+IId0POTk5dO2fbvR
Eive87Nfssc5gx3+YxcO7l1DSMDnYL5DwwXUkatnXv8tRWz/h8kFLclV70xP
d7fT+xamaFIHhGH8Q+2Gw5mFdDNXhLCk8rGPfyry4mQSK7w4DVQWxnQm19N2
POTIQjE+hbCar3ulNW7ndBPJHv6Yr561lgiUIhLKyhrrmz7vFnHfCEhprKer
AAczS06X2rDZ1JJQ5eRE54tp6hiJKrDJ7rQuhQzK1vK8eZphpuqkpn6ffqST
/dcR/3/bjrAmE4o/Qifaz82F7ou6cXr6jXZhzULBTlHS3Y46LlYyQtH0Hiz7
YZJE5KHdJ9xImmM6hA4zv1VqIXGLlDj3mOouHLou2a9/r+wNFLBVF2WtLmxM
/wlg
=cx50
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux