mon leveldb loss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, I'm hoping desperately that someone can help. I have a critical issue with a tiny 'cluster'...

There was a power glitch earlier today (not an outage, might have been a brownout, some things went down, others didn't) and i came home to a CPU machine check exception on the singular host on which i keep a trio of ceph monitors. No option but to hard reset. When the system came back up, the monitors didn't.

Each mon is reporting possible corruption of their leveldb stores, files are missing, one might surmise an fsck decided to discard them. See attached txt files for ceph-mon output and corresponding store.db directory listings.

Is there any way to recover the leveldb for the monitors? I am more than capable and willing to dig into the structure of these files - or any similar measures - if necessary. Perhaps correlate a compete picture between the data files that are available?

I do have a relevant backup of the monitor data but it is now three months old. I would prefer not to have to resort to this if there is any chance of recovering monitor operability by other means.

Also, what would the consequences be of restoring such a backup when the (12TB worth of) osd's are perfectly fine and contain the latest up-to-date pg associations? Would there be a risk of data loss?

Unfortunately i don't have any backups of the actual user data (being poor, scraping along on a shoestring budget, not exactly conducive to anything approaching an ideal hardware setup), unless one counts a set of old disks from a previously failed cluster from six months ago.

My last recourse will likely be to try to scavenge and piece together my most important files from whatever i find on the osd's. Far from an exciting prospect but i am seriously desperate.

I would be terribly grateful for any input.

Mike

2015-01-29 19:49:30.590913 7fa66458d7c0  0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18788
Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb
Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-0/store.db/1054928.ldb
2015-01-29 19:49:37.542790 7fa66458d7c0 -1 failed to create new leveldb store
2015-01-29 19:49:43.279940 7f03e8ec87c0  0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18846
Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb
Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-1/store.db/1054939.ldb
2015-01-29 19:49:50.708742 7f03e8ec87c0 -1 failed to create new leveldb store
2015-01-29 19:49:47.866736 7fb6aeebe7c0  0 ceph version 0.74 (c165483bc72031ed9e5cca4e52fe3dd6142c8baa), process ceph-mon, pid 18869
Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb
Corruption: 10 missing files; e.g.: /var/lib/ceph/mon/unimatrix-2/store.db/1054942.ldb
2015-01-29 19:49:54.935436 7fb6aeebe7c0 -1 failed to create new leveldb store
mon/unimatrix-0/store.db/:
total 42160
-rw-r--r-- 1 root root       57 Aug 24 14:59 LOG
-rw-r--r-- 1 root root        0 Aug 24 14:59 LOCK
drwxr-xr-x 3 root root       80 Aug 24 14:59 ..
-rw-r--r-- 1 root root       16 Nov  2 18:24 CURRENT
-rw-r--r-- 1 root root   182248 Jan 29 05:13 1051297.ldb
-rw-r--r-- 1 root root    82124 Jan 29 13:53 1054697.ldb
-rw-r--r-- 1 root root    46609 Jan 29 14:00 1054744.ldb
-rw-r--r-- 1 root root   165708 Jan 29 14:07 1054790.ldb
-rw-r--r-- 1 root root    83304 Jan 29 14:16 1054851.ldb
-rw-r--r-- 1 root root    18620 Jan 29 14:16 1054858.ldb
-rw-r--r-- 1 root root 42568979 Jan 29 14:23 MANIFEST-399002
drwxr-xr-x 2 root root      240 Jan 29 14:23 .

mon/unimatrix-2/store.db/:
total 42180
-rw-r--r-- 1 root root       57 Aug 24 15:09 LOG
-rw-r--r-- 1 root root        0 Aug 24 15:09 LOCK
drwxr-xr-x 3 root root       80 Aug 24 15:09 ..
-rw-r--r-- 1 root root       16 Nov  2 18:24 CURRENT
-rw-r--r-- 1 root root   182248 Jan 29 05:13 1051311.ldb
-rw-r--r-- 1 root root    82124 Jan 29 13:53 1054711.ldb
-rw-r--r-- 1 root root    46609 Jan 29 14:00 1054758.ldb
-rw-r--r-- 1 root root   165708 Jan 29 14:07 1054804.ldb
-rw-r--r-- 1 root root    83304 Jan 29 14:16 1054865.ldb
-rw-r--r-- 1 root root    18620 Jan 29 14:16 1054872.ldb
-rw-r--r-- 1 root root 42589118 Jan 29 14:23 MANIFEST-399004
drwxr-xr-x 2 root root      240 Jan 29 14:23 .

mon/unimatrix-1/store.db/:
total 42180
-rw-r--r-- 1 root root        0 Aug 24 15:03 LOCK
drwxr-xr-x 3 root root       80 Aug 24 15:03 ..
-rw-r--r-- 1 root root       57 Aug 24 15:03 LOG
-rw-r--r-- 1 root root       16 Nov  2 18:24 CURRENT
-rw-r--r-- 1 root root   182248 Jan 29 05:13 1051308.ldb
-rw-r--r-- 1 root root    82124 Jan 29 13:53 1054708.ldb
-rw-r--r-- 1 root root    46609 Jan 29 14:00 1054755.ldb
-rw-r--r-- 1 root root   165708 Jan 29 14:07 1054801.ldb
-rw-r--r-- 1 root root    83304 Jan 29 14:16 1054862.ldb
-rw-r--r-- 1 root root    18620 Jan 29 14:16 1054869.ldb
-rw-r--r-- 1 root root 42588884 Jan 29 14:23 MANIFEST-399005
drwxr-xr-x 2 root root      240 Jan 29 14:23 .
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux