On 4/12/16 9:53 AM, Joao Eduardo Luis wrote:
So this looks like the monitors didn't remove version 1, but this may
just be a red herring.
What matters, really, is the values in 'first_committed' and
'last_committed'. If either first or last_committed happens to be '1',
then there may be a bug somewhere in the code, but I doubt that. This
seems just an artefact.
So, it would be nice if you could provide the value of both
'osdmap:first_committed' and 'osdmap:last_committed'.
mon1:
(osdmap, last_committed)
0000 : 01 00 00 00 00 00 00 00 : ........
(osdmap, fist_committed) does not exist
mon2:
(osdmap, last_committed)
0000 : 01 00 00 00 00 00 00 00 : ........
(osdmap, fist_committed) does not exist
mon3:
(osdmap, last_committed)
0000 : 01 00 00 00 00 00 00 00 : ........
(osdmap, first_committed)
0000 : b8 94 00 00 00 00 00 00
Furthermore, the code is asserting on a basic check on
OSDMonitor::update_from_paxos(), which is definitely unexpected to fail.
It would also be nice if you could point us to a mon log with
'--debug-mon 20' from start to hitting the assertion. Feel free to send
it directly to me if you don't want the sitting on the internet.
Here is from mon2 (cephsecurestore2 IRL), which starts and dies with the
assert:
http://www.isis.vanderbilt.edu/mon2.log
Here is from mon3 (cephsecurestore3 IRL), which starts and runs, but
can't form quorum and never gives up on mon1 and mon2. Removing mon1
and mon2 from mon3's via modmap extract/rm/inject results in same FAILED
assert as others:
http://www.isis.vanderbilt.edu/mon3.log
My thought was that if I could resolve the last_committed problem on
mon3, then it might have a change sans mon1 and mon2.
Thank you,
--
Eric
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com