Hi, I'm hoping desperately that someone can help. I have a critical issue with a tiny 'cluster'...
There was a power glitch earlier today (not an outage, might have been a brownout, some things went down, others didn't) and i came home to a CPU machine check exception on the singular host on which i keep a trio of ceph monitors. No option but to hard reset. When the system came back up, the monitors didn't.
Each mon is reporting possible corruption of their leveldb stores, files are missing, one might surmise an fsck decided to discard them. See attached txt files for ceph-mon output and corresponding store.db directory listings.
Is there any way to recover the leveldb for the monitors? I am more than capable and willing to dig into the structure of these files - or any similar measures - if necessary. Perhaps correlate a compete picture between the data files that are available?
I do have a relevant backup of the monitor data but it is now three months old. I would prefer not to have to resort to this if there is any chance of recovering monitor operability by other means.
Also, what would the consequences be of restoring such a backup when the (12TB worth of) osd's are perfectly fine and contain the latest up-to-date pg associations? Would there be a risk of data loss?
Unfortunately i don't have any backups of the actual user data (being poor, scraping along on a shoestring budget, not exactly conducive to anything approaching an ideal hardware setup), unless one counts a set of old disks from a previously failed cluster from six months ago.
My last recourse will likely be to try to scavenge and piece together my most important files from whatever i find on the osd's. Far from an exciting prospect but i am seriously desperate.
I would be terribly grateful for any input.
Mike
2015-01-29 19:49:30.590913 7fa66458d7c0 0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18788 Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb 2015-01-29 19:49:37.542790 7fa66458d7c0 -1 failed to create new leveldb store
2015-01-29 19:49:43.279940 7f03e8ec87c0 0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18846 Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb 2015-01-29 19:49:50.708742 7f03e8ec87c0 -1 failed to create new leveldb store
2015-01-29 19:49:47.866736 7fb6aeebe7c0 0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18869 Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb 2015-01-29 19:49:54.935436 7fb6aeebe7c0 -1 failed to create new leveldb store
mon/unimatrix-0/store.db/: total 42160 -rw-r--r-- 1 root root 57 Aug 24 14:59 LOG -rw-r--r-- 1 root root 0 Aug 24 14:59 LOCK drwxr-xr-x 3 root root 80 Aug 24 14:59 .. -rw-r--r-- 1 root root 16 Nov 2 18:24 CURRENT -rw-r--r-- 1 root root 182248 Jan 29 05:13 1051297.ldb -rw-r--r-- 1 root root 82124 Jan 29 13:53 1054697.ldb -rw-r--r-- 1 root root 46609 Jan 29 14:00 1054744.ldb -rw-r--r-- 1 root root 165708 Jan 29 14:07 1054790.ldb -rw-r--r-- 1 root root 83304 Jan 29 14:16 1054851.ldb -rw-r--r-- 1 root root 18620 Jan 29 14:16 1054858.ldb -rw-r--r-- 1 root root 42568979 Jan 29 14:23 MANIFEST-399002 drwxr-xr-x 2 root root 240 Jan 29 14:23 . mon/unimatrix-2/store.db/: total 42180 -rw-r--r-- 1 root root 57 Aug 24 15:09 LOG -rw-r--r-- 1 root root 0 Aug 24 15:09 LOCK drwxr-xr-x 3 root root 80 Aug 24 15:09 .. -rw-r--r-- 1 root root 16 Nov 2 18:24 CURRENT -rw-r--r-- 1 root root 182248 Jan 29 05:13 1051311.ldb -rw-r--r-- 1 root root 82124 Jan 29 13:53 1054711.ldb -rw-r--r-- 1 root root 46609 Jan 29 14:00 1054758.ldb -rw-r--r-- 1 root root 165708 Jan 29 14:07 1054804.ldb -rw-r--r-- 1 root root 83304 Jan 29 14:16 1054865.ldb -rw-r--r-- 1 root root 18620 Jan 29 14:16 1054872.ldb -rw-r--r-- 1 root root 42589118 Jan 29 14:23 MANIFEST-399004 drwxr-xr-x 2 root root 240 Jan 29 14:23 . mon/unimatrix-1/store.db/: total 42180 -rw-r--r-- 1 root root 0 Aug 24 15:03 LOCK drwxr-xr-x 3 root root 80 Aug 24 15:03 .. -rw-r--r-- 1 root root 57 Aug 24 15:03 LOG -rw-r--r-- 1 root root 16 Nov 2 18:24 CURRENT -rw-r--r-- 1 root root 182248 Jan 29 05:13 1051308.ldb -rw-r--r-- 1 root root 82124 Jan 29 13:53 1054708.ldb -rw-r--r-- 1 root root 46609 Jan 29 14:00 1054755.ldb -rw-r--r-- 1 root root 165708 Jan 29 14:07 1054801.ldb -rw-r--r-- 1 root root 83304 Jan 29 14:16 1054862.ldb -rw-r--r-- 1 root root 18620 Jan 29 14:16 1054869.ldb -rw-r--r-- 1 root root 42588884 Jan 29 14:23 MANIFEST-399005 drwxr-xr-x 2 root root 240 Jan 29 14:23 .
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com