Hi, today we had an issue with our 6 node Ceph cluster. We had to shutdown one node (Ceph-03), to replace a disk (because, we did now know the slot). We set the noout flag and did a graceful shutdown. All was O.K. After the disk was replaced, the node comes up and our VMs had a big I/O latency. We never saw this in the past, with the same procedure ... * From our logs on Ceph-01: 2018-10-18 15:53:45.455743 mon.qh-a07-ceph-osd-03 mon.2 10.3.0.3:6789/0 1 : cluster [INF] mon.qh-a07-ceph- osd-03 calling monitor election ... 2018-10-18 15:53:45.503818 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663050 : cluster [INF] mon.qh-a07-ceph-osd-01 is new leader, mons qh-a07-ceph-osd-01,qh-a07-ceph-osd-02,qh-a07-ceph-osd-03,qh-a07-ceph-osd-04,qh-a07-ceph-osd-05,qh * First OSD comes up: 2018-10-18 15:53:55.207742 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663063 : cluster [WRN] Health check update: 10 osds down (OSD_DOWN) 2018-10-18 15:53:55.207768 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663064 : cluster [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (11 osds) down) 2018-10-18 15:53:55.240079 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663065 : cluster [INF] osd.43 10.3.0.3:6812/7554 boot * All OSDs where up: 2018-10-18 15:54:25.331692 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663134 : cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2018-10-18 15:54:25.360151 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663135 : cluster [INF] osd.12 10.3.0.3:6820/8537 boot * This OSDs here are a mix of HDD and SDD and different nodes 2018-10-18 15:54:27.073266 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663138 : cluster [WRN] Health check update: Degraded data redundancy: 84012/4293867 objects degraded (1.957%), 1316 pgs degraded, 487 pgs undersized (PG_DEGRADED) 2018-10-18 15:54:32.073644 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663142 : cluster [WRN] Health check update: Degraded data redundancy: 4611/4293867 objects degraded (0.107%), 1219 pgs degraded (PG_DEGRADED) 2018-10-18 15:54:36.841189 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663144 : cluster [WRN] Health check failed: 1 slow requests are blocked > 32 sec. Implicated osds 16 (REQUEST_SLOW) 2018-10-18 15:54:37.074098 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663145 : cluster [WRN] Health check update: Degraded data redundancy: 4541/4293867 objects degraded (0.106%), 1216 pgs degraded (PG_DEGRADED) 2018-10-18 15:54:42.074510 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663149 : cluster [WRN] Health check update: Degraded data redundancy: 4364/4293867 objects degraded (0.102%), 1176 pgs degraded (PG_DEGRADED) 2018-10-18 15:54:42.074561 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663150 : cluster [WRN] Health check update: 5 slow requests are blocked > 32 sec. Implicated osds 15,25,30,34 (REQUEST_SLOW) 2018-10-18 15:54:47.074886 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663152 : cluster [WRN] Health check update: Degraded data redundancy: 4193/4293867 objects degraded (0.098%), 1140 pgs degraded (PG_DEGRADED) 2018-10-18 15:54:47.074934 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663153 : cluster [WRN] Health check update: 5 slow requests are blocked > 32 sec. Implicated osds 9,15,23,30 (REQUEST_SLOW) 2018-10-18 15:54:52.075274 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663156 : cluster [WRN] Health check update: Degraded data redundancy: 4087/4293867 objects degraded (0.095%), 1120 pgs degraded (PG_DEGRADED) 2018-10-18 15:54:52.075313 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663157 : cluster [WRN] Health check update: 14 slow requests are blocked > 32 sec. Implicated osds 2,13,14,15,16,23 (REQUEST_SLOW) 2018-10-18 15:54:57.075635 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663158 : cluster [WRN] Health check update: Degraded data redundancy: 3932/4293867 objects degraded (0.092%), 1074 pgs degraded (PG_DEGRADED) 2018-10-18 15:54:57.075683 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663159 : cluster [WRN] Health check update: 6 slow requests are blocked > 32 sec. Implicated osds 14,15 (REQUEST_SLOW) 2018-10-18 15:55:02.076071 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663163 : cluster [WRN] Health check update: Degraded data redundancy: 3805/4293867 objects degraded (0.089%), 1036 pgs degraded (PG_DEGRADED) 2018-10-18 15:55:02.076138 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663164 : cluster [WRN] Health check update: 4 slow requests are blocked > 32 sec. Implicated osds 1,15,19 (REQUEST_SLOW) 2018-10-18 15:55:07.076562 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663168 : cluster [WRN] Health check update: Degraded data redundancy: 3656/4293867 objects degraded (0.085%), 988 pgs degraded (PG_DEGRADED) 2018-10-18 15:55:07.076633 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663169 : cluster [WRN] Health check update: 4 slow requests are blocked > 32 sec. Implicated osds 15,22 (REQUEST_SLOW) 2018-10-18 15:55:12.077091 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663174 : cluster [WRN] Health check update: Degraded data redundancy: 3509/4293867 objects degraded (0.082%), 934 pgs degraded (PG_DEGRADED) 2018-10-18 15:55:12.077130 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663175 : cluster [WRN] Health check update: 1 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW) 2018-10-18 15:55:13.317758 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663178 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 1 slow requests are blocked > 32 sec. Implicated osds 15) .... 2018-10-18 15:55:17.077685 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663180 : cluster [WRN] Health check update: Degraded data redundancy: 3351/4293867 objects degraded (0.078%), 886 pgs degraded, 1 pg undersized (PG_DEGRADED) 2018-10-18 15:55:22.078077 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663184 : cluster [WRN] Health check update: Degraded data redundancy: 3202/4293867 objects degraded (0.075%), 831 pgs degraded, 3 pgs undersized (PG_DEGRADED) * And then, all went back to work: 2018-10-18 16:00:22.876849 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663322 : cluster [WRN] Health check failed: Reduced data availability: 4 pgs inactive, 6 pgs peering (PG_AVAILABILITY) 2018-10-18 16:00:24.898816 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663326 : cluster [WRN] Health check failed: 102307/4289049 objects misplaced (2.385%) (OBJECT_MISPLACED) 2018-10-18 16:00:28.995821 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663327 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 3 pgs peering) 2018-10-18 16:00:31.026616 mon.qh-a07-ceph-osd-01 mon.0 10.3.0.1:6789/0 1663330 : cluster [WRN] Health check update: 156337/4294191 objects misplaced (3.641%) (OBJECT_MISPLACED) A few mins later, we added the replaced disks again (on ceph-osd-06 and on ceph-osd-03), without issues. To the I/O stuck started as the node where in the cluster again ( 15:53:45 ...) we have no clue ... and never happens in the past. We have per OSD node: * 2 x Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz * 64GB DDR4 ECC * HDD: WD Red 1TB 2.5 with Intel NVME SSDPEDMD400G4 as journal device * SDD: Samsung 950 EVO 512GB * 2 x Symbios Logic SAS3224 PCI-Express Fusion-MPT SAS-3 (one for SSD, one for HDD) * Mellanox 100GB Ceph network (CX-4) * 6 Mon (on Ceph nodes) Ceph.conf [global] osd journal size = 5120 osd pool default min size = 2 osd pool default size = 3 [osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring osd max backfills = 20 osd recovery max active = 8 no special crush rule, just the HDD / SDD one. * ceph osd pool get ssd-pool pg_num pg_num: 2048 * root@qh-a07-ceph-osd-01:[~]: ceph osd pool get sata-pool pg_num pg_num: 1024 * root@qh-a07-ceph-osd-01:[~]: ceph osd pool get db-pool pg_num pg_num: 1024 * ceph -s data: pools: 3 pools, 4096 pgs objects: 1.43M objects, 5.39TiB usage: 15.7TiB used, 33.4TiB / 49.1TiB avail pgs: 4096 active+clean Any suggestions to avoid this stuck next time ? cu denny
Attachment:
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com