Re: mons die with mon/OSDMonitor.cc: 125: FAILED assert(version >= osdmap.epoch)...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/12/2016 04:27 PM, Eric Hall wrote:
On 4/12/16 9:53 AM, Joao Eduardo Luis wrote:

So this looks like the monitors didn't remove version 1, but this may
just be a red herring.

What matters, really, is the values in 'first_committed' and
'last_committed'. If either first or last_committed happens to be '1',
then there may be a bug somewhere in the code, but I doubt that. This
seems just an artefact.

So, it would be nice if you could provide the value of both
'osdmap:first_committed' and 'osdmap:last_committed'.

mon1:
(osdmap, last_committed)
0000 : 01 00 00 00 00 00 00 00                         : ........
(osdmap, fist_committed) does not exist

mon2:
(osdmap, last_committed)
0000 : 01 00 00 00 00 00 00 00                         : ........
(osdmap, fist_committed) does not exist

mon3:
(osdmap, last_committed)
0000 : 01 00 00 00 00 00 00 00                         : ........
(osdmap, first_committed)
0000 : b8 94 00 00 00 00 00 00

Wow! This is unexpected, but fits the assertion just fine.

The solution, I think, will be rewriting first_committed and last_committed on all monitors - except on mon1.

As per your previous email, in which you listed the osdmap version intervals for each monitor, it seems like mon1 contains incremental versions [38072..38630] and full versions [38072..38456] - i.e., there's a bunch of full versions missing from 38456 to 38630.

The other two monitors do not seem afflicted by this gap.

This will not be necessarily a problem as long as osdmap:full_latest contains the version of the latest full map in the monitor's store. If by any chance osdmap:full_latest contains a lower version than the lowest full map version available, or a greater version than the highest full map version, then problems will ensue. That said,

I would advise performing the following in a copy of your monitors (injecting a custom monmap to make it run it solo[1]), so that any still-running osds are not affected by any eventual side effects. Once you are sure no assertions have been hit and the monitor is running fine, feel free to apply these to your monitors.

1. set osdmap:first_committed to 38072
2. set osdmap:last_committed to 38630
3. set osdmap:full_latest to whatever is the latest full_XXXXX version on the monitor.
  3.1. this means 38630 on mon2 and mon3 - but 38456 on mon1

Setting versions should be as simple as

ceph-kvstore-tool ${MONDATA}/store.db set osdmap ${KEY} ver ${VER}

with ${KEY} being either first_committed, last_committed or full_latest

and ${VER} being the appropriate value.


Hope this helps.

  -Joao

[1] This assert is only triggered once a quorum is formed, which means
you'll either have to have all the monitors running, or forcing the
quorum to be of just one single monitor.


Furthermore, the code is asserting on a basic check on
OSDMonitor::update_from_paxos(), which is definitely unexpected to fail.
It would also be nice if you could point us to a mon log with
'--debug-mon 20' from start to hitting the assertion. Feel free to send
it directly to me if you don't want the sitting on the internet.

Here is from mon2 (cephsecurestore2 IRL), which starts and dies with the
assert:
http://www.isis.vanderbilt.edu/mon2.log

Here is from mon3 (cephsecurestore3 IRL), which starts and runs, but
can't form quorum and never gives up on mon1 and mon2.  Removing mon1
and mon2 from mon3's via modmap extract/rm/inject results in same FAILED
assert as others:
http://www.isis.vanderbilt.edu/mon3.log


My thought was that if I could resolve the last_committed problem on
mon3, then it might have a change sans mon1 and mon2.

Thank you,
--
Eric

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux