Re: Update crushmap when monitors are down

Pardhiv Karri <meher4india@xxxxxxxxx> · Mon, 1 Apr 2019 18:27:03 -0700

Hi Huang,

We are on ceph Luminous 12.2.11

The primary is sh1ora1300 but that is not coming up at all. sh1ora1301 and sh1ora1302 are coming up and are in quorum as per log but still not able to run any ceph commands. Below is part of the log.

2019-04-02 00:48:51.644339 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 105 : cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:51:41.706135 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 292 : cluster [WRN] overall HEALTH_WARN crush map has legacy tunables (require bobtail, min is firefly); 399 osds down; 14 hosts (17 osds) down; 785718/146017356 objects misplaced (0.538%); 10/48672452 objects unfound (0.000%); Reduced data availability: 11606 pgs inactive, 86 pgs down, 779 pgs peering, 3081 pgs stale; Degraded data redundancy: 59329035/146017356 objects degraded (40.631%), 16508 pgs degraded, 19795 pgs undersized; 1/3 mons down, quorum sh1ora1301,sh1ora1302
2019-04-02 00:52:15.583292 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 293 : cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.224838 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 294 : cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.256251 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 295 : cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:39.810572 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 296 : cluster [INF] mon.sh1ora1301 is new leader, mons sh1ora1301,sh1ora1302 in quorum (ranks 1,2)
2019-04-02 00:48:06.751139 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 104 : cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:48:51.644339 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 105 : cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:51:41.706135 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 292 : cluster [WRN] overall HEALTH_WARN crush map has legacy tunables (require bobtail, min is firefly); 399 osds down; 14 hosts (17 osds) down; 785718/146017356 objects misplaced (0.538%); 10/48672452 objects unfound (0.000%); Reduced data availability: 11606 pgs inactive, 86 pgs down, 779 pgs peering, 3081 pgs stale; Degraded data redundancy: 59329035/146017356 objects degraded (40.631%), 16508 pgs degraded, 19795 pgs undersized; 1/3 mons down, quorum sh1ora1301,sh1ora1302
2019-04-02 00:52:15.583292 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 293 : cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.224838 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 294 : cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.256251 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 295 : cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:39.810572 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 296 : cluster [INF] mon.sh1ora1301 is new leader, mons sh1ora1301,sh1ora1302 in quorum (ranks 1,2)
2019-04-02 00:48:06.751139 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 104 : cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:48:51.644339 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 105 : cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:51:41.706135 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 292 : cluster [WRN] overall HEALTH_WARN crush map has legacy tunables (require bobtail, min is firefly); 399 osds down; 14 hosts (17 osds) down; 785718/146017356 objects misplaced (0.538%); 10/48672452 objects unfound (0.000%); Reduced data availability: 11606 pgs inactive, 86 pgs down, 779 pgs peering, 3081 pgs stale; Degraded data redundancy: 59329035/146017356 objects degraded (40.631%), 16508 pgs degraded, 19795 pgs undersized; 1/3 mons down, quorum sh1ora1301,sh1ora1302
2019-04-02 00:52:15.583292 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 293 : cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.224838 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 294 : cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.256251 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 295 : cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:39.810572 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 296 : cluster [INF] mon.sh1ora1301 is new leader, mons sh1ora1301,sh1ora1302 in quorum (ranks 1,2)

Thanks,
Pardhiv Karri

On Mon, Apr 1, 2019 at 6:16 PM huang jun <hjwsm1989@xxxxxxxxx> wrote:
Can you provide detail error logs  when mon crash?

Pardhiv Karri <meher4india@xxxxxxxxx> 于2019年4月2日周二 上午9:02写道：

>

> Hi,

>

> Our ceph production cluster is down when updating crushmap. Now we can't get out monitors to come online and when they come online for a fraction of a second we see crush map errors in logs. How can we update crushmap when monitors are down as none of the ceph commands are working.

>

> Thanks,

> Pardhiv Karri

>

>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

Thank you!

HuangJun

-- 
Pardhiv Karri
"Rise and Rise again until LAMBS become LIONS" 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com