Re: mons die with mon/OSDMonitor.cc: 125: FAILED assert(version >= osdmap.epoch)...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 11, 2016 at 3:45 PM, Eric Hall <eric.hall@xxxxxxxxxxxxxx> wrote:
> Power failure in data center has left 3 mons unable to start with
> mon/OSDMonitor.cc: 125: FAILED assert(version >= osdmap.epoch)
>
> Have found simliar problem discussed at
> http://irclogs.ceph.widodh.nl/index.php?date=2015-05-29, but am unsure how
> to proceed.
>
> If I read
> ceph-kvstore-tool /var/lib/ceph/mon/ceph-cephsecurestore1/store.db list
> correctly, they believe osdmap is 1, but they also have osdmap:full_38456
> and osdmap:38630 in the store.
>
> Working from http://irclogs info, something like
> ceph-kvstore-tool /var/lib/ceph/mon/ceph-foo/store.db set osdmap NNNNN in
> /tmp/osdmap
> might help, but I am unsure of value for NNNN.  Seems like too delicate an
> operation for experimentation.

Exactly what values are you reading that's giving you those values?
The "real" OSDMap epoch is going to be at least 38630...if you're very
lucky it will be exactly 38630. But since it reset itself to 1 in the
monitor's store, I doubt you'll be lucky.

So in order to get your cluster back up, you need to find the largest
osdmap version in your cluster. You can do that, very tediously, by
looking at the OSDMap stores. Or you may have debug logs indicating it
more easily on the monitors.

But your most important task is to find out why your monitors went
back in time — if the software and hardware underneath of Ceph are
behaving, that should be impossible. The usual scenario is that you
have caches enabled which aren't power-safe (eg, inside of the drives)
or have disabled barriers or something.
-Greg

>
>
> OS: Ubuntu 14.04.4
> kernel: 3.13.0-83-generic
> ceph: Firefly 0.80.11-1trusty
>
> Any assistance appreciated,
> --
> Eric Hall
> Institute for Software Integrated Systems
> Vanderbilt University
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux