Re: cephfs deep scrub error:

John Spray <jspray@xxxxxxxxxx> · Mon, 13 Mar 2017 12:35:30 +0000

On Mon, Mar 13, 2017 at 10:28 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> Hi John,
>
> Last week we updated our prod CephFS cluster to 10.2.6 (clients and
> server side), and for the first time today we've got an object info
> size mismatch:
>
> I found this ticket you created in the tracker, which is why I've
> emailed you: http://tracker.ceph.com/issues/18240
>
> Here's the detail of our error:
>
> 2017-03-13 07:17:49.989297 osd.67 <snip>:6819/3441125 262 : cluster
> [ERR] deep-scrub 1.3da 1:5bc0e9dc:::10000260f4b.00000003:head on disk
> size (4187974) does not match object info size (4193094) adjusted for
> ondisk to (4193094)

Hmm, never investigated that particular ticket but did notice that the
issue never re-occurred on master, or on jewel since the start of
2017.

The last revision where it cropped up in testing was
5b402f8a7b5a763852e93cd0a5decd34572f4518, looking at the history
between that commit and the tip of the jewel branch I don't see
anything that I recognise as a corruption fix (apart from the omap fix
which is clearly unrelated as we're looking at data objects in these
failures).

John

> All three replica's have the same object size/md5sum:
>
> # ls -l 10000260f4b.00000003__head_3B9703DA__1
> -rw-r--r--. 1 ceph ceph 4187974 Mar 12 18:50
> 10000260f4b.00000003__head_3B9703DA__1
> # md5sum 10000260f4b.00000003__head_3B9703DA__1
> db1e1bab199b33fce3ad9195832626ef 10000260f4b.00000003__head_3B9703DA__1
>
> And indeed the object info does not agree with the files on disk:
>
> # ceph-dencoder type object_info_t import /tmp/attr1 decode dump_json
> {
>     "oid": {
>         "oid": "10000260f4b.00000003",
>         "key": "",
>         "snapid": -2,
>         "hash": 999752666,
>         "max": 0,
>         "pool": 1,
>         "namespace": ""
>     },
>     "version": "5262'221037",
>     "prior_version": "5262'221031",
>     "last_reqid": "osd.67.0:1180241",
>     "user_version": 221031,
>     "size": 4193094,
>     "mtime": "0.000000",
>     "local_mtime": "0.000000",
>     "lost": 0,
>     "flags": 52,
>     "snaps": [],
>     "truncate_seq": 80,
>     "truncate_size": 0,
>     "data_digest": 2779145704,
>     "omap_digest": 4294967295,
>     "watchers": {}
> }
>
>
> PG repair doesn't handle these kind of corruptions, but I found a
> recipe in an old thread to fix the object info with hexedit. Before
> doing this I wanted to see if we can understand exactly how this is
> possible.
>
> I managed to find the exact cephfs file, and asked the user how they
> created it. They said the file was the output of a make test on some
> program. The make test was taking awhile, so they left their laptop,
> and when they returned to the computer, the ssh connection to their
> cephfs workstation had broken. I assume this means that the process
> writing the file had been killed while writing to cephfs. But I don't
> understand how a killed client process could result in inconsistent
> object info.
>
> Is there anything else needed to help debug this inconsistency?
>
> Cheers, Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com