Re: Ceph and its failures

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Tue, 23 Feb 2016 21:05:35 -0700



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

You probably haven't written to any objects after fixing the problem.
Do some client I/O on the cluster and the PG will show fixed again. I
had this happen to me as well.
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.3.5
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWzSv5CRDmVDuy+mK58QAABe4P/jJ4Vtp9qsV6T49/17FW
qgoZlxIfTLDXNnsTUUFju3c20hDHTET8uMCsaCrLb02ZujbGV0a1LcW/ffJe
hjWx1ExyyrN0bTdwBe+RRycKriHTFH19Fx3zVoRQvDaWoTAbjTFZkvQAxftN
vqKonYxsWyvITYLCFMtX0aPEljo+kQ8BNK4vJoPA2hw6cc0TKIKHSsbt9a0Q
6eCjuSPB76cGDRfbxnZbTXT79UgPD4m5ztNo3stXjvfzRMq0/6YLov8rBXTJ
y5bnlheBOHfwcS/9P1Vdi+LDDy+iaZb5/gEwXPPzV2uGr/z8RTgGMk0dKyk3
fzZHWU7FhUIl3OVDF3IqQe2tZtWTs59fithHRme7T7+tmQaG0VOd1noMYlNz
n3bCQOJutfcyWvU4naQSkgAPfvTH0GwNp16ETAZlB6pADKtH3oXMOPW3CH5H
HyY5+H9w7ELbYiuJlGwMRyko/sNIiVEoj2dZB/ta+61G8+nlYR2GsjLceXOM
HP9Wi3MrVJtXDLFrnQRglB2dfFWvBlrlBTj3uG7Ebn5DO6glxPEAvzrOgsJ2
O8D5+AMvooc41T74aUcWQK8NHNrrN+eL18yhRfjCgyadA2VYvWeu6K7sIUFo
NKFE66ahsxrNKZUrLjeCo69iP4Zf5+AgY7rCau81vzQNtmFUPjzUKyOzgpsb
Y2fQ
=TGcG
-----END PGP SIGNATURE-----
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Feb 23, 2016 at 2:08 PM, Nmz <nemesiz@xxxxxx> wrote:
>>> ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>>>
>>> Ceph contains
>>>  MON: 3
>>>  OSD: 3
>>>
>> For completeness sake, the OSDs are on 3 different hosts, right?
>
> It is single machine. I`m doing tests only.
>
>>> File system: ZFS
>> That is the odd one out, very few people I'm aware of use it, support for
>> it is marginal at best.
>> And some of its features may of course obscure things.
>
> I`m using ZFS on linux for a log time and I`m happy with it.
>
>
>> Exact specification please, as in how is ZFS configured (single disk,
>> raid-z, etc)?
>
> 2 disks in mirror mode.
>
>>> Kernel: 4.2.6
>>>
>> While probably not related, I vaguely remember 4.3 being recommended for
>> use with Ceph.
>
> At this time I can run only this kernel. But IF I decide to use Ceph (only if Ceph satisfy requirements) I can use any other kernel.
>
>>> 3. Does Ceph have auto heal option?
>> No.
>> And neither is the repair function a good idea w/o checking the data on
>> disk first.
>> This is my biggest pet peeve with Ceph and you will find it mentioned
>> frequently in this ML, just a few days ago this thread for example:
>> "pg repair behavior? (Was: Re: getting rid of misplaced objects)"
>
> It is very strange to recovery data manually without know which data is good.
> If I have 3 copies of data and 2 of them are corrupted then I cat recovery the bad one.
>
>
> ------------------
>
> Did some new test. Now new 3 OSD are in different systems. FS is ext3
>
> Same start as before.
>
> # grep "aaaaaaaaa" * -R
> Binary file osd/nmz-5/current/17.17_head/rbd\udata.1bef77ac761fb.0000000000000001__head_FB98F317__11 matches
> Binary file osd/nmz-5-journal/journal matches
>
> # ceph pg dump | grep 17.17
> dumped all in format plain
> 17.17   1       0       0       0       0       4096    1       1       active+clean    2016-02-23 16:14:32.234638      291'1   309:44  [5,4,3] 5       [5,4,3] 5       0'0     2016-02-22 20:30:04.255301      0'0     2016-02-22 20:30:04.255301
>
> # md5sum rbd\\udata.1bef77ac761fb.0000000000000001__head_FB98F317__11
> \c2642965410d118c7fe40589a34d2463  rbd\\udata.1bef77ac761fb.0000000000000001__head_FB98F317__11
>
> # sed -i -r 's/aaaaaaaaaa/abaaaaaaaa/g' rbd\\udata.1bef77ac761fb.0000000000000001__head_FB98F317__11
>
>
> # ceph pg deep-scrub 17.17
>
> 7fbd99e6c700  0 log_channel(cluster) log [INF] : 17.17 deep-scrub starts
> 7fbd97667700  0 log_channel(cluster) log [INF] : 17.17 deep-scrub ok
>
> -- restartind OSD.5
>
> # ceph pg deep-scrub 17.17
>
> 7f00f40b8700  0 log_channel(cluster) log [INF] : 17.17 deep-scrub starts
> 7f00f68bd700 -1 log_channel(cluster) log [ERR] : 17.17 shard 5: soid 17/fb98f317/rbd_data.1bef77ac761fb.0000000000000001/head data_digest 0x389d90f6 != known data_digest 0x4f18a4a5 from auth shard 3, missing attr _, missing attr snapset
> 7f00f68bd700 -1 log_channel(cluster) log [ERR] : 17.17 deep-scrub 0 missing, 1 inconsistent objects
> 7f00f68bd700 -1 log_channel(cluster) log [ERR] : 17.17 deep-scrub 1 errors
>
>
> Ceph 9.2.0 bug ?
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com