Re: cephfs deep scrub error:

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 13 Mar 2017 13:43:13 -0700



On Mon, Mar 13, 2017 at 3:28 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> Hi John,
>
> Last week we updated our prod CephFS cluster to 10.2.6 (clients and
> server side), and for the first time today we've got an object info
> size mismatch:
>
> I found this ticket you created in the tracker, which is why I've
> emailed you: http://tracker.ceph.com/issues/18240
>
> Here's the detail of our error:
>
> 2017-03-13 07:17:49.989297 osd.67 <snip>:6819/3441125 262 : cluster
> [ERR] deep-scrub 1.3da 1:5bc0e9dc:::10000260f4b.00000003:head on disk
> size (4187974) does not match object info size (4193094) adjusted for
> ondisk to (4193094)
>
> All three replica's have the same object size/md5sum:
>
> # ls -l 10000260f4b.00000003__head_3B9703DA__1
> -rw-r--r--. 1 ceph ceph 4187974 Mar 12 18:50
> 10000260f4b.00000003__head_3B9703DA__1
> # md5sum 10000260f4b.00000003__head_3B9703DA__1
> db1e1bab199b33fce3ad9195832626ef 10000260f4b.00000003__head_3B9703DA__1
>
> And indeed the object info does not agree with the files on disk:
>
> # ceph-dencoder type object_info_t import /tmp/attr1 decode dump_json
> {
>     "oid": {
>         "oid": "10000260f4b.00000003",
>         "key": "",
>         "snapid": -2,
>         "hash": 999752666,
>         "max": 0,
>         "pool": 1,
>         "namespace": ""
>     },
>     "version": "5262'221037",
>     "prior_version": "5262'221031",
>     "last_reqid": "osd.67.0:1180241",
>     "user_version": 221031,

The only string I can see to tug on here is this: that the version and
user_version have clearly diverged, which (most?) commonly means some
kind of recovery or caching op was performed. You're not using cache
tiers, are you? Is it possible somebody ran recovery and overwrote
good objects with the bad one before you got to look at the raw files?
-Greg

>     "size": 4193094,
>     "mtime": "0.000000",
>     "local_mtime": "0.000000",
>     "lost": 0,
>     "flags": 52,
>     "snaps": [],
>     "truncate_seq": 80,
>     "truncate_size": 0,
>     "data_digest": 2779145704,
>     "omap_digest": 4294967295,
>     "watchers": {}
> }
>
>
> PG repair doesn't handle these kind of corruptions, but I found a
> recipe in an old thread to fix the object info with hexedit. Before
> doing this I wanted to see if we can understand exactly how this is
> possible.
>
> I managed to find the exact cephfs file, and asked the user how they
> created it. They said the file was the output of a make test on some
> program. The make test was taking awhile, so they left their laptop,
> and when they returned to the computer, the ssh connection to their
> cephfs workstation had broken. I assume this means that the process
> writing the file had been killed while writing to cephfs. But I don't
> understand how a killed client process could result in inconsistent
> object info.
>
> Is there anything else needed to help debug this inconsistency?
>
> Cheers, Dan
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com