Re: cephfs deep scrub error:

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Mon, 13 Mar 2017 15:33:15 +0100

On Mon, Mar 13, 2017 at 1:35 PM, John Spray <jspray@xxxxxxxxxx> wrote:
> On Mon, Mar 13, 2017 at 10:28 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>> Hi John,
>>
>> Last week we updated our prod CephFS cluster to 10.2.6 (clients and
>> server side), and for the first time today we've got an object info
>> size mismatch:
>>
>> I found this ticket you created in the tracker, which is why I've
>> emailed you: http://tracker.ceph.com/issues/18240
>>
>> Here's the detail of our error:
>>
>> 2017-03-13 07:17:49.989297 osd.67 <snip>:6819/3441125 262 : cluster
>> [ERR] deep-scrub 1.3da 1:5bc0e9dc:::10000260f4b.00000003:head on disk
>> size (4187974) does not match object info size (4193094) adjusted for
>> ondisk to (4193094)
>
>
> Hmm, never investigated that particular ticket but did notice that the
> issue never re-occurred on master, or on jewel since the start of
> 2017.
>
> The last revision where it cropped up in testing was
> 5b402f8a7b5a763852e93cd0a5decd34572f4518, looking at the history
> between that commit and the tip of the jewel branch I don't see
> anything that I recognise as a corruption fix (apart from the omap fix
> which is clearly unrelated as we're looking at data objects in these
> failures).
>
> John

Thanks for checking, John. In the end we deleted the inconsistent
file, rescrubbed, and will just keep a look out for similar cases.

-- Dan

>
>
>> All three replica's have the same object size/md5sum:
>>
>> # ls -l 10000260f4b.00000003__head_3B9703DA__1
>> -rw-r--r--. 1 ceph ceph 4187974 Mar 12 18:50
>> 10000260f4b.00000003__head_3B9703DA__1
>> # md5sum 10000260f4b.00000003__head_3B9703DA__1
>> db1e1bab199b33fce3ad9195832626ef 10000260f4b.00000003__head_3B9703DA__1
>>
>> And indeed the object info does not agree with the files on disk:
>>
>> # ceph-dencoder type object_info_t import /tmp/attr1 decode dump_json
>> {
>>     "oid": {
>>         "oid": "10000260f4b.00000003",
>>         "key": "",
>>         "snapid": -2,
>>         "hash": 999752666,
>>         "max": 0,
>>         "pool": 1,
>>         "namespace": ""
>>     },
>>     "version": "5262'221037",
>>     "prior_version": "5262'221031",
>>     "last_reqid": "osd.67.0:1180241",
>>     "user_version": 221031,
>>     "size": 4193094,
>>     "mtime": "0.000000",
>>     "local_mtime": "0.000000",
>>     "lost": 0,
>>     "flags": 52,
>>     "snaps": [],
>>     "truncate_seq": 80,
>>     "truncate_size": 0,
>>     "data_digest": 2779145704,
>>     "omap_digest": 4294967295,
>>     "watchers": {}
>> }
>>
>>
>> PG repair doesn't handle these kind of corruptions, but I found a
>> recipe in an old thread to fix the object info with hexedit. Before
>> doing this I wanted to see if we can understand exactly how this is
>> possible.
>>
>> I managed to find the exact cephfs file, and asked the user how they
>> created it. They said the file was the output of a make test on some
>> program. The make test was taking awhile, so they left their laptop,
>> and when they returned to the computer, the ssh connection to their
>> cephfs workstation had broken. I assume this means that the process
>> writing the file had been killed while writing to cephfs. But I don't
>> understand how a killed client process could result in inconsistent
>> object info.
>>
>> Is there anything else needed to help debug this inconsistency?
>>
>> Cheers, Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com