Hello,
your log extract shows that:
2019-02-15 21:40:08 OSD.29 DOWN
2019-02-15 21:40:09 PG_AVAILABILITY warning start
2019-02-15 21:40:15 PG_AVAILABILITY warning cleared
2019-02-15 21:44:06 OSD.29 UP
2019-02-15 21:44:08 PG_AVAILABILITY warning start
2019-02-15 21:44:15 PG_AVAILABILITY warning cleared
What you saw is the natural consequence of OSD state change. Those two periods of limited PG availability (6s each) are related to peering that happens shortly after an OSD goes down or up.
Basically, the placement groups stored on that OSD need peering, so the incoming connections are directed to other (alive) OSDs. And, yes, during those few seconds the data are not accessible.
Kind regards,
Maks
sob., 16 lut 2019 o 07:25 <jesper@xxxxxxxx> napisał(a):
Yesterday I saw this one.. it puzzles me:
2019-02-15 21:00:00.000126 mon.torsk1 mon.0 10.194.132.88:6789/0 604164 :
cluster [INF] overall HEALTH_OK
2019-02-15 21:39:55.793934 mon.torsk1 mon.0 10.194.132.88:6789/0 604304 :
cluster [WRN] Health check failed: 2 slow requests are blocked > 32 sec.
Implicated osds 58 (REQUEST_SLOW)
2019-02-15 21:40:00.887766 mon.torsk1 mon.0 10.194.132.88:6789/0 604305 :
cluster [WRN] Health check update: 6 slow requests are blocked > 32 sec.
Implicated osds 9,19,52,58,68 (REQUEST_SLOW)
2019-02-15 21:40:06.973901 mon.torsk1 mon.0 10.194.132.88:6789/0 604306 :
cluster [WRN] Health check update: 14 slow requests are blocked > 32 sec.
Implicated osds 3,9,19,29,32,52,55,58,68,69 (REQUEST_SLOW)
2019-02-15 21:40:08.466266 mon.torsk1 mon.0 10.194.132.88:6789/0 604307 :
cluster [INF] osd.29 failed (root=default,host=bison) (6 reporters from
different host after 33.862482 >= grace 29.247323)
2019-02-15 21:40:08.473703 mon.torsk1 mon.0 10.194.132.88:6789/0 604308 :
cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-02-15 21:40:09.489494 mon.torsk1 mon.0 10.194.132.88:6789/0 604310 :
cluster [WRN] Health check failed: Reduced data availability: 6 pgs
peering (PG_AVAILABILITY)
2019-02-15 21:40:11.008906 mon.torsk1 mon.0 10.194.132.88:6789/0 604312 :
cluster [WRN] Health check failed: Degraded data redundancy:
3828291/700353996 objects degraded (0.547%), 77 pgs degraded (PG_DEGRADED)
2019-02-15 21:40:13.474777 mon.torsk1 mon.0 10.194.132.88:6789/0 604313 :
cluster [WRN] Health check update: 9 slow requests are blocked > 32 sec.
Implicated osds 3,9,32,55,58,69 (REQUEST_SLOW)
2019-02-15 21:40:15.060165 mon.torsk1 mon.0 10.194.132.88:6789/0 604314 :
cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data
availability: 17 pgs peering)
2019-02-15 21:40:17.128185 mon.torsk1 mon.0 10.194.132.88:6789/0 604315 :
cluster [WRN] Health check update: Degraded data redundancy:
9897139/700354131 objects degraded (1.413%), 200 pgs degraded
(PG_DEGRADED)
2019-02-15 21:40:17.128219 mon.torsk1 mon.0 10.194.132.88:6789/0 604316 :
cluster [INF] Health check cleared: REQUEST_SLOW (was: 2 slow requests are
blocked > 32 sec. Implicated osds 32,55)
2019-02-15 21:40:22.137090 mon.torsk1 mon.0 10.194.132.88:6789/0 604317 :
cluster [WRN] Health check update: Degraded data redundancy:
9897140/700354194 objects degraded (1.413%), 200 pgs degraded
(PG_DEGRADED)
2019-02-15 21:40:27.249354 mon.torsk1 mon.0 10.194.132.88:6789/0 604318 :
cluster [WRN] Health check update: Degraded data redundancy:
9897142/700354287 objects degraded (1.413%), 200 pgs degraded
(PG_DEGRADED)
2019-02-15 21:40:33.335147 mon.torsk1 mon.0 10.194.132.88:6789/0 604322 :
cluster [WRN] Health check update: Degraded data redundancy:
9897143/700354356 objects degraded (1.413%), 200 pgs degraded
(PG_DEGRADED)
....... shortened ......
2019-02-15 21:43:48.496536 mon.torsk1 mon.0 10.194.132.88:6789/0 604366 :
cluster [WRN] Health check update: Degraded data redundancy:
9897168/700356693 objects degraded (1.413%), 200 pgs degraded, 201 pgs
undersized (PG_DEGRADED)
2019-02-15 21:43:53.496924 mon.torsk1 mon.0 10.194.132.88:6789/0 604367 :
cluster [WRN] Health check update: Degraded data redundancy:
9897170/700356804 objects degraded (1.413%), 200 pgs degraded, 201 pgs
undersized (PG_DEGRADED)
2019-02-15 21:43:58.497313 mon.torsk1 mon.0 10.194.132.88:6789/0 604368 :
cluster [WRN] Health check update: Degraded data redundancy:
9897172/700356879 objects degraded (1.413%), 200 pgs degraded, 201 pgs
undersized (PG_DEGRADED)
2019-02-15 21:44:03.497696 mon.torsk1 mon.0 10.194.132.88:6789/0 604369 :
cluster [WRN] Health check update: Degraded data redundancy:
9897174/700356996 objects degraded (1.413%), 200 pgs degraded, 201 pgs
undersized (PG_DEGRADED)
2019-02-15 21:44:06.939331 mon.torsk1 mon.0 10.194.132.88:6789/0 604372 :
cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-02-15 21:44:06.965401 mon.torsk1 mon.0 10.194.132.88:6789/0 604373 :
cluster [INF] osd.29 10.194.133.58:6844/305358 boot
2019-02-15 21:44:08.498060 mon.torsk1 mon.0 10.194.132.88:6789/0 604376 :
cluster [WRN] Health check update: Degraded data redundancy:
9897174/700357056 objects degraded (1.413%), 200 pgs degraded, 201 pgs
undersized (PG_DEGRADED)
2019-02-15 21:44:08.996099 mon.torsk1 mon.0 10.194.132.88:6789/0 604377 :
cluster [WRN] Health check failed: Reduced data availability: 12 pgs
peering (PG_AVAILABILITY)
2019-02-15 21:44:13.498472 mon.torsk1 mon.0 10.194.132.88:6789/0 604378 :
cluster [WRN] Health check update: Degraded data redundancy: 55/700357161
objects degraded (0.000%), 33 pgs degraded (PG_DEGRADED)
2019-02-15 21:44:15.081437 mon.torsk1 mon.0 10.194.132.88:6789/0 604379 :
cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data
availability: 12 pgs peering)
2019-02-15 21:44:18.498808 mon.torsk1 mon.0 10.194.132.88:6789/0 604380 :
cluster [WRN] Health check update: Degraded data redundancy: 14/700357230
objects degraded (0.000%), 9 pgs degraded (PG_DEGRADED)
2019-02-15 21:44:19.132797 mon.torsk1 mon.0 10.194.132.88:6789/0 604381 :
cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data
redundancy: 14/700357230 objects degraded (0.000%), 9 pgs degraded)
2019-02-15 21:44:19.132824 mon.torsk1 mon.0 10.194.132.88:6789/0 604382 :
cluster [INF] Cluster is now healthy
2019-02-15 22:00:00.000117 mon.torsk1 mon.0 10.194.132.88:6789/0 604402 :
cluster [INF] overall HEALTH_OK
Why do I end up with a PG_AVAILABILITY warning with just one OSD down. We
have 3x replicated pools and 4+2 EC pools in the system?. Or am I just
mis-reading what PG_AVAILABILTY means on the docs... but I got to the
conclusion that "some data is inaccessible"?_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com