Fixing inconsistency

Межов Игорь Александрович <megov@xxxxxxxxxx> · Mon, 16 Nov 2015 15:14:46 +0000

Hi!

We have a hard crash on one node - it hangs in an indefinite state and do not respond
neither network requests, nor even console commands. After node restart, all OSDs successfully
mount their filesystems (ext4) and rejoin the cluster. Some time later, scrub process found two errors.

The first:

2015-11-14 07:23:12.451157 osd.93 192.168.36.21:6804/5358 110 : cluster [ERR] 6.d36 scrub stat mismatch, 
got 2592/2593 objects, 0/0 clones, 2592/2593 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 
3572748288/3576942592 bytes,0/0 hit_set_archive bytes.

2015-11-14 07:23:12.451165 osd.93 192.168.36.21:6804/5358 111 : cluster [ERR] 6.d36 scrub 1 errors

The second:

2015-11-14 10:55:56.559884 osd.45 192.168.36.48:6801/3022 136 : cluster [ERR] deep-scrub 6.5ed 
25ce05ed/rbd_data.15524c22ae8944a.00000000000226ff/head//6 on disk size (4194304) does not match
object info size (2777088) adjusted for ondisk to (2777088)

2015-11-14 10:56:59.649777 osd.45 192.168.36.48:6801/3022 137 : cluster [ERR] deep-scrub 6.5ed 
d6d595ed/rbd_data.15524c22ae8944a.0000000000018313/head//6 on disk size (4194304) does not match 
object info size (1921024) adjusted for ondisk to (1921024)

2015-11-14 10:57:50.248048 osd.45 192.168.36.48:6801/3022 138 : cluster [ERR] 6.5ed deep-scrub stat mismatch,
got 2614/2616 objects, 0/0 clones, 2614/2616 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 
3725238272/3733626880 bytes,0/0 hit_set_archive bytes.

2015-11-14 10:57:50.248056 osd.45 192.168.36.48:6801/3022 139 : cluster [ERR] 6.5ed deep-scrub 3 errors

I think, that object metadata is corrupted in some way. The first error sucsessfully fixed by issuing
'ceph pg repair' command to PG 6.d36. The bad objects in the second error are parts of 3tb rbd image,
containing VM (NTFS), so they have to be 4194304 bytes each. All three replicas are in their places
(OSD) and have the same size and checksum (md5), so I suppose, that on-disk content of this objects are
not corrupted, and there is only problem with their metadata. Sadly, but we have no time and spare
place to recreate this rbd image and copy the data to a new location. I also do not want to truncate
objects to a 2777088 and 1921024 bytes, like someone suggested in mailing list earlier, because it can
damage filesystem in guest VM.

So, the questions is:
 - why ceph shows a different size for this objects?
 - what I can do to repair this objects?
 - maybe there is any method to get objects and their metadata in editable form, change the size
and put it back?

Thanks,
Megov Igor
CIO, Yuterra
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com