Re: mons die with mon/OSDMonitor.cc: 125: FAILED assert(version >= osdmap.epoch)...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/12/2016 07:16 PM, Eric Hall wrote:
Removed mon on mon1, added mon on mon1 via ceph-deply.  mons now have
quorum.

I am left with:
    cluster 5ee52b50-838e-44c4-be3c-fc596dc46f4e
      health HEALTH_WARN 1086 pgs peering; 1086 pgs stuck inactive; 1086
pgs stuck unclean; pool vms has too few pgs
      monmap e5: 3 mons at
{cephsecurestore1=172.16.250.7:6789/0,cephsecurestore2=172.16.250.8:6789/0,cephsecurestore3=172.16.250.9:6789/0},
election epoch 28, quorum 0,1,2
cephsecurestore1,cephsecurestore2,cephsecurestore3
      mdsmap e2: 0/0/1 up
      osdmap e38769: 67 osds: 67 up, 66 in
       pgmap v33886066: 7688 pgs, 24 pools, 4326 GB data, 892 kobjects
             11620 GB used, 8873 GB / 20493 GB avail
                    3 active+clean+scrubbing+deep
                 1086 peering
                 6599 active+clean

All OSDs are up/in as reported.  But I see no recovery I/O for those in
inactive/peering/unclean.

Someone else will probably be able to chime in with more authority than me, but I would first try to restart the osds to which those stuck pgs are being mapped.

  -Joao


Thanks,
--
Eric

On 4/12/16 1:14 PM, Joao Eduardo Luis wrote:
On 04/12/2016 06:38 PM, Eric Hall wrote:
Ok, mon2 and mon3 are happy together, but mon1 dies with
mon/MonitorDBStore.h: 287: FAILED assert(0 == "failed to write to db")

I take this to mean mon1:store.db is corrupt as I see no permission
issues.

So... remove mon1 and add a mon?

Nothing special to worry about re-adding a mon on mon1, other than rm/mv
the current store.db path, correct?

You'll actually need to recreate the mon with 'ceph-mon --mkfs' for that
to work, and that will likely require you to rm/mv the mon data
directory.

You *could* copy the mon dir from one of the other monitors and use that
instead. But given you have a functioning quorum, I don't think there's
any reason to resort to that.

Follow the docs on removing monitors[1] and recreate the monitor from
scratch, adding it to the cluster. It will sync up from scratch from the
other monitors. That'll make them happy.

   -Joao

[1]
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors




Thanks again,
--
Eric

On 4/12/16 11:18 AM, Joao Eduardo Luis wrote:
On 04/12/2016 05:06 PM, Joao Eduardo Luis wrote:
On 04/12/2016 04:27 PM, Eric Hall wrote:
On 4/12/16 9:53 AM, Joao Eduardo Luis wrote:

So this looks like the monitors didn't remove version 1, but this
may
just be a red herring.

What matters, really, is the values in 'first_committed' and
'last_committed'. If either first or last_committed happens to be
'1',
then there may be a bug somewhere in the code, but I doubt that.
This
seems just an artefact.

So, it would be nice if you could provide the value of both
'osdmap:first_committed' and 'osdmap:last_committed'.

mon1:
(osdmap, last_committed)
0000 : 01 00 00 00 00 00 00 00                         : ........
(osdmap, fist_committed) does not exist

mon2:
(osdmap, last_committed)
0000 : 01 00 00 00 00 00 00 00                         : ........
(osdmap, fist_committed) does not exist

mon3:
(osdmap, last_committed)
0000 : 01 00 00 00 00 00 00 00                         : ........
(osdmap, first_committed)
0000 : b8 94 00 00 00 00 00 00

Wow! This is unexpected, but fits the assertion just fine.

The solution, I think, will be rewriting first_committed and
last_committed on all monitors - except on mon1.

Let me clarify this a bit: the easy way out for mon1 would be to fix
the
other two monitors and recreate mon1.

If you prefer to also fix mon1, you can simply follow the same steps on
the previous email for all the monitors, but ensuring
osdmap:full_latest
on mon1 reflects the last available full_XXXX version on its store.

   -Joao


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux