Hi John, Last week we updated our prod CephFS cluster to 10.2.6 (clients and server side), and for the first time today we've got an object info size mismatch: I found this ticket you created in the tracker, which is why I've emailed you: http://tracker.ceph.com/issues/18240 Here's the detail of our error: 2017-03-13 07:17:49.989297 osd.67 <snip>:6819/3441125 262 : cluster [ERR] deep-scrub 1.3da 1:5bc0e9dc:::10000260f4b.00000003:head on disk size (4187974) does not match object info size (4193094) adjusted for ondisk to (4193094) All three replica's have the same object size/md5sum: # ls -l 10000260f4b.00000003__head_3B9703DA__1 -rw-r--r--. 1 ceph ceph 4187974 Mar 12 18:50 10000260f4b.00000003__head_3B9703DA__1 # md5sum 10000260f4b.00000003__head_3B9703DA__1 db1e1bab199b33fce3ad9195832626ef 10000260f4b.00000003__head_3B9703DA__1 And indeed the object info does not agree with the files on disk: # ceph-dencoder type object_info_t import /tmp/attr1 decode dump_json { "oid": { "oid": "10000260f4b.00000003", "key": "", "snapid": -2, "hash": 999752666, "max": 0, "pool": 1, "namespace": "" }, "version": "5262'221037", "prior_version": "5262'221031", "last_reqid": "osd.67.0:1180241", "user_version": 221031, "size": 4193094, "mtime": "0.000000", "local_mtime": "0.000000", "lost": 0, "flags": 52, "snaps": [], "truncate_seq": 80, "truncate_size": 0, "data_digest": 2779145704, "omap_digest": 4294967295, "watchers": {} } PG repair doesn't handle these kind of corruptions, but I found a recipe in an old thread to fix the object info with hexedit. Before doing this I wanted to see if we can understand exactly how this is possible. I managed to find the exact cephfs file, and asked the user how they created it. They said the file was the output of a make test on some program. The make test was taking awhile, so they left their laptop, and when they returned to the computer, the ssh connection to their cephfs workstation had broken. I assume this means that the process writing the file had been killed while writing to cephfs. But I don't understand how a killed client process could result in inconsistent object info. Is there anything else needed to help debug this inconsistency? Cheers, Dan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com