How to remove lost objects.

Andrey Stepachev <octo47@xxxxxxxxx> · Wed, 18 Jan 2012 19:37:17 +0400

Hi,

I've test ceph against laggy network. (0ms-400ms delays).
At some moment i got many messages like:
2012-01-18 16:06:49.184776 7ff134119700 -- 84.201.161.73:6801/25424
send_message dropped message osd_op_reply(291 1000000101b.0000001e
[write 66734080~37
4784] ondisk = 0) v1 because of no pipe on con 0x315e640
And ceph don't respond on ls on some of subdirs (via hadoop fs -ls or
kernel client)
My cluster runs with no debug at that moment, so I can't find what is going on.

After restart ceph writes to log
2012-01-18 16:10:39.985509 7f217989d780 osd.1 155 pg[0.155( v 136'373
(94'368,136'373]+backlog n=3 ec=1 les/c 150/145 146/151/58) [] r=0
lpr=0 (info mismatch, log(94'368,0'0]+backlog) (log bound mismatch,
actual=[8'124,94'369]) lcod 0'0 mlcod 0'0 inactive] read_log  got dup
94'369 (last was 94'369, dropping that one)

After such strange hangouts i found, that rm -rf on filesystem
(mounted via kernel),
fs shows, that 210Gb still in use. Looking at /data/osd.x i found many
objects inside.
So:
a) looks like some errors lead us to orphaned objects in rados
b) i can't find utility, which can check that orpaned data (and cleanup it)

Question: how I can identify what objects are, and how I can clean up them.

-- 
Andrey.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html