On 4/12/16 12:01 AM, Gregory Farnum wrote:
On Mon, Apr 11, 2016 at 3:45 PM, Eric Hall <eric.hall@xxxxxxxxxxxxxx> wrote:Power failure in data center has left 3 mons unable to start with mon/OSDMonitor.cc: 125: FAILED assert(version >= osdmap.epoch) Have found simliar problem discussed at http://irclogs.ceph.widodh.nl/index.php?date=2015-05-29, but am unsure how to proceed. If I read ceph-kvstore-tool /var/lib/ceph/mon/ceph-cephsecurestore1/store.db list correctly, they believe osdmap is 1, but they also have osdmap:full_38456 and osdmap:38630 in the store.Exactly what values are you reading that's giving you those values? The "real" OSDMap epoch is going to be at least 38630...if you're very lucky it will be exactly 38630. But since it reset itself to 1 in the monitor's store, I doubt you'll be lucky.
I'm getting this from ceph-kvstore-tool list.
So in order to get your cluster back up, you need to find the largest osdmap version in your cluster. You can do that, very tediously, by looking at the OSDMap stores. Or you may have debug logs indicating it more easily on the monitors.
I don't see info like this in any logs. How/where do I inspect this? Thank you, -- Eric
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com