Re: Recovering from multiple OSD failures

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Fri, 5 Jun 2015 12:33:43 -0600



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

When copying to the primary OSD, a deep-scrub has worked for me, but
I've not done this exact scenario. Did you try bouncing the OSD
process?
- ----------------
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Jun 5, 2015 at 12:23 PM, Aaron Ten Clay  wrote:
> Robert,
>
> I did try scrubbing and deep-scrubbing - it seems the OSD is ignoring
> deep-scrub and scrub commands for the PG (I imagine because the state does
> not include "active".)
>
> However, I came across this blog post last night and am currently pursuing:
> https://ceph.com/community/incomplete-pgs-oh-my/
>
> Thanks,
> -Aaron
>
>
> On Fri, Jun 5, 2015 at 11:17 AM, Robert LeBlanc
> wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Did you try to deep-scrub the PG after copying it to 29?
>> - ----------------
>> Robert LeBlanc
>> GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Thu, Jun 4, 2015 at 10:26 PM, Aaron Ten Clay  wrote:
>> > Hi Cephers,
>> >
>> > I recently had a power problem and the entire cluster was brought down,
>> > came
>> > up, went down, and came up again. Afterword, 3 OSDs were mostly dead
>> > (HDD
>> > failures). Luckily (I think) the drives were alive enough that I could
>> > copy
>> > the data off and leave the journal alone.
>> >
>> > Since my pool "data" size is 3... of course a couple of placement groups
>> > were only on those three drives.
>> >
>> > Now I've added 4 new OSDs, and everything has recovered, except pg 0.f3.
>> > When I query the pg, I see the cluster is looking for OSD 14 or 23
>> > because
>> > one of them maybe_went_rw. (5, 14, and 23 are now kaput and "ceph osd
>> > lost
>> > --yes-i-really-mean-it")
>> >
>> > Ceph indicates OSD 29 is now the primary for pg 0.f3. I copied all the
>> > data
>> > to the appropriate directory, started OSD.29 again, and here is where my
>> > question comes in:
>> >
>> > How do I convince the cluster that it's okay to bring 0.f3 'up' and
>> > backfill
>> > to the other OSDs from 29? (I could even manually backfill 15 and 22,
>> > but I
>> > suspect the cluster will still think there's a problem)
>> >
>> > 'ceph health detail' shows this about 0.f3:
>> > pg 0.f3 is incomplete, acting [29,22,15] (reducing pool data min_size
>> > from 2
>> > may help; search ceph.com/docs for 'incomplete')
>> >
>> > Thanks in advance!
>> > -Aaron
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: Mailvelope v0.13.1
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJVcee3CRDmVDuy+mK58QAA0g4QAIGmDnQcR2LtMr9YqJfz
>> mUwC1ZN35WCzFn8varbTcJZqW0MEHQZ3vP6EqWcoPqQhc6ehHi8+onjTTmDf
>> 1CLMAyqxh/rRglqYP9EnWXdzMmOaQCDC5ImOLD79tvjxDAsRZ1TPwtPWsj3S
>> CxaitKF13dot31AftUjTUsZ8eTjNyCT1VIQJVEiHxVtgQfOlEGNaQnYDZ955
>> eJKsKTBoTlU1nZe0jKCefac8ZgJ32ngGu4KGyfUEx4MQsRWx0Uv9+71OiTJ4
>> E3mxxtR1OBhcbAK6syqm1sh8ELa8fMOBTZzC9nauYpYfhvnW5yk5PmDrUdri
>> gGoQozqspIPmaa9fXjO5Q8+L86mhG69CnlRU7zD2frzLp6RCT6dWQhhTuHt5
>> oPWbdoY6dWY93nJPjGo3Dh4WnFR1SAezn1STOI1rtIQGu+uvD5Joc/6QUuUS
>> R9mSC3XycUX1YfGGetsaz1PZhBV0CEytrnOh22CM907luvSubXyd5Hrddu3p
>> zNQ1xUz5IFsaDPWgk27k3vY98CfZy+EJJ7fJvlC6gU6NHX0Y7YrU6kzGauiQ
>> 0NKMfjjSsaxaPnHKtPbQ8OkX4j1hvXsgHKsClLNaVQHvQFlSB5rScQZ7rmYV
>> I/UWxhIz70q467J0eA0+kCqTsTEw6VJ62bdoWrd2MkAwd4WpRCA/VtK73B6+
>> 9ByE
>> =ODgo
>> -----END PGP SIGNATURE-----
>
>

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVceuECRDmVDuy+mK58QAAHnIQAMHfSczScJhvXOUykOp9
EE+yiOnnRBU8nMguC5Si0gCljVOGnYqn5I8gQX8VY7pIxAgKClVBbFkM2RFB
Lp2Oydx87SRgMZ3FWtvoZKTxfLkbUda/y1Inazy+zVsJQ98YMuzVWnpQ7DLb
lkAXXz62C0IInoca1TGMfaSZAGsxSv55DYkxWEkBXH0+jDQUquuiofZlQTLl
x31fZv+3qKZ1WBXz6rGgnl3ZuRMQs+/iPd0OEWMzxYXadmHULnOgzdUrQ9zw
0zfbilguJdkfpUW5WaDmE1NgV1CbL5Lk5ncm+r4PRcLJ60X0ZUBGVftkqx69
+1ipAGNk/XdLBKcFtDiu/IHRocjQ9iuR4DAWkW/zRgnKB2W+UkA39Q9DhsjO
fGzEt3qVCzTOE8I7/kMQQDjr0uNLNvEEocdnnpY1d92Z/gTpFCHUxWfRGbyv
XzDPtOMeJWVOpOLBxHsb8yh3+0f/oS5D6BC09gFD2Lcg5nZxC++q4xpUcJOa
Y+Vbv47YM8pZKriO15WBwnrj5GKRHNSOM2PKzOto8vjXiLBCxoPCmKqE2e3q
GsHFI0w+u6aRsvcfxl7C/5ke2kogq7tt5JADOR89XzLTiYUFYRS+Cf6mmSMN
/R/dhmM/9cIe0NdnynufmVM3xEMJSC4TzlvgN6hY3Rl2gXAfJoWGGiTNG9TV
iVG4
=sieY
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com