On 07/04/2014 12:29 AM, Jason Harley wrote: > Hi list ? > > I?ve got a small dev. cluster: 3 OSD nodes with 6 disks/OSDs each and a single monitor (this, it seems, was my mistake). The monitor node went down hard and it looks like the monitor?s db is in a funny state. Running ?ceph-mon? manually with ?debug_mon 20? and ?debug_ms 20? gave the following: > >> /usr/bin/ceph-mon -i monhost --mon-data /var/lib/ceph/mon/ceph-monhost --debug_mon 20 --debug_ms 20 -d >> 2014-07-03 23:20:55.800512 7f973918e7c0 0 ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73), process ceph-mon, pid 24930 >> Corruption: checksum mismatch >> Corruption: checksum mismatch >> 2014-07-03 23:20:56.455797 7f973918e7c0 -1 failed to create new leveldb store > > I attempted to make use of the leveldb Python library?s ?RepairDB? function, which just moves enough files into ?lost? that when running the monitor again I?m asked if I ran mkcephfs. > > Any insight into resolving these two checksum mismatches so I can access my OSD data would be greatly appreciated. > > Thanks, > ./JRH > > p.s. I?m assuming that without the maps from the monitor, my OSD data is unrecoverable also. Hello Jason, We don't have a way to repair leveldb. Having multiple monitors usually help with such tricky situations. According to this [1] the python bindings you're using may not be linked into snappy, which we were using (mistakenly until recently) to compress data as it goes into leveldb. Not having those snappy bindings may be what's causing all those files to be moved to lost instead. The suggestion that the thread in [1] offers is to have the repair functionality directly in the 'application' itself. We could do this by adding a repair option to ceph-kvstore-tool -- which could help. I'll be happy to get that into ceph-kvstore-tool tomorrow and push a branch for you to compile and test. -Joao [1] - https://groups.google.com/forum/#!topic/leveldb/YvszWNio2-Q -- Joao Eduardo Luis Software Engineer | http://inktank.com | http://ceph.com