Re: PG_AVAILABILITY with one osd down?

Maks Kowalik <maks_kowalik@xxxxxxxxx> · Sat, 16 Feb 2019 20:58:43 +0100

Hello,
your log extract shows that:

2019-02-15 21:40:08 OSD.29 DOWN
2019-02-15 21:40:09 PG_AVAILABILITY warning start
2019-02-15 21:40:15 PG_AVAILABILITY warning cleared

2019-02-15 21:44:06 OSD.29 UP
2019-02-15 21:44:08 PG_AVAILABILITY warning start
2019-02-15 21:44:15 PG_AVAILABILITY warning cleared

What you saw is the natural consequence of OSD state change. Those two periods of limited PG availability (6s each) are related to peering that happens shortly after an OSD goes down or up.
Basically, the placement groups stored on that OSD need peering, so the incoming connections are directed to other (alive) OSDs. And, yes, during those few seconds the data are not accessible.
Kind regards,
Maks

sob., 16 lut 2019 o 07:25 <jesper@xxxxxxxx> napisał(a):
Yesterday I saw this one.. it puzzles me:

2019-02-15 21:00:00.000126 mon.torsk1 mon.0 10.194.132.88:6789/0 604164 :

cluster [INF] overall HEALTH_OK

2019-02-15 21:39:55.793934 mon.torsk1 mon.0 10.194.132.88:6789/0 604304 :

cluster [WRN] Health check failed: 2 slow requests are blocked > 32 sec.

Implicated osds 58 (REQUEST_SLOW)

2019-02-15 21:40:00.887766 mon.torsk1 mon.0 10.194.132.88:6789/0 604305 :

cluster [WRN] Health check update: 6 slow requests are blocked > 32 sec.

Implicated osds 9,19,52,58,68 (REQUEST_SLOW)

2019-02-15 21:40:06.973901 mon.torsk1 mon.0 10.194.132.88:6789/0 604306 :

cluster [WRN] Health check update: 14 slow requests are blocked > 32 sec.

Implicated osds 3,9,19,29,32,52,55,58,68,69 (REQUEST_SLOW)

2019-02-15 21:40:08.466266 mon.torsk1 mon.0 10.194.132.88:6789/0 604307 :

cluster [INF] osd.29 failed (root=default,host=bison) (6 reporters from

different host after 33.862482 >= grace 29.247323)

2019-02-15 21:40:08.473703 mon.torsk1 mon.0 10.194.132.88:6789/0 604308 :

cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)

2019-02-15 21:40:09.489494 mon.torsk1 mon.0 10.194.132.88:6789/0 604310 :

cluster [WRN] Health check failed: Reduced data availability: 6 pgs

peering (PG_AVAILABILITY)

2019-02-15 21:40:11.008906 mon.torsk1 mon.0 10.194.132.88:6789/0 604312 :

cluster [WRN] Health check failed: Degraded data redundancy:

3828291/700353996 objects degraded (0.547%), 77 pgs degraded (PG_DEGRADED)

2019-02-15 21:40:13.474777 mon.torsk1 mon.0 10.194.132.88:6789/0 604313 :

cluster [WRN] Health check update: 9 slow requests are blocked > 32 sec.

Implicated osds 3,9,32,55,58,69 (REQUEST_SLOW)

2019-02-15 21:40:15.060165 mon.torsk1 mon.0 10.194.132.88:6789/0 604314 :

cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data

availability: 17 pgs peering)

2019-02-15 21:40:17.128185 mon.torsk1 mon.0 10.194.132.88:6789/0 604315 :

cluster [WRN] Health check update: Degraded data redundancy:

9897139/700354131 objects degraded (1.413%), 200 pgs degraded

(PG_DEGRADED)

2019-02-15 21:40:17.128219 mon.torsk1 mon.0 10.194.132.88:6789/0 604316 :

cluster [INF] Health check cleared: REQUEST_SLOW (was: 2 slow requests are

blocked > 32 sec. Implicated osds 32,55)

2019-02-15 21:40:22.137090 mon.torsk1 mon.0 10.194.132.88:6789/0 604317 :

cluster [WRN] Health check update: Degraded data redundancy:

9897140/700354194 objects degraded (1.413%), 200 pgs degraded

(PG_DEGRADED)

2019-02-15 21:40:27.249354 mon.torsk1 mon.0 10.194.132.88:6789/0 604318 :

cluster [WRN] Health check update: Degraded data redundancy:

9897142/700354287 objects degraded (1.413%), 200 pgs degraded

(PG_DEGRADED)

2019-02-15 21:40:33.335147 mon.torsk1 mon.0 10.194.132.88:6789/0 604322 :

cluster [WRN] Health check update: Degraded data redundancy:

9897143/700354356 objects degraded (1.413%), 200 pgs degraded

(PG_DEGRADED)

....... shortened ......

2019-02-15 21:43:48.496536 mon.torsk1 mon.0 10.194.132.88:6789/0 604366 :

cluster [WRN] Health check update: Degraded data redundancy:

9897168/700356693 objects degraded (1.413%), 200 pgs degraded, 201 pgs

undersized (PG_DEGRADED)

2019-02-15 21:43:53.496924 mon.torsk1 mon.0 10.194.132.88:6789/0 604367 :

cluster [WRN] Health check update: Degraded data redundancy:

9897170/700356804 objects degraded (1.413%), 200 pgs degraded, 201 pgs

undersized (PG_DEGRADED)

2019-02-15 21:43:58.497313 mon.torsk1 mon.0 10.194.132.88:6789/0 604368 :

cluster [WRN] Health check update: Degraded data redundancy:

9897172/700356879 objects degraded (1.413%), 200 pgs degraded, 201 pgs

undersized (PG_DEGRADED)

2019-02-15 21:44:03.497696 mon.torsk1 mon.0 10.194.132.88:6789/0 604369 :

cluster [WRN] Health check update: Degraded data redundancy:

9897174/700356996 objects degraded (1.413%), 200 pgs degraded, 201 pgs

undersized (PG_DEGRADED)

2019-02-15 21:44:06.939331 mon.torsk1 mon.0 10.194.132.88:6789/0 604372 :

cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)

2019-02-15 21:44:06.965401 mon.torsk1 mon.0 10.194.132.88:6789/0 604373 :

cluster [INF] osd.29 10.194.133.58:6844/305358 boot

2019-02-15 21:44:08.498060 mon.torsk1 mon.0 10.194.132.88:6789/0 604376 :

cluster [WRN] Health check update: Degraded data redundancy:

9897174/700357056 objects degraded (1.413%), 200 pgs degraded, 201 pgs

undersized (PG_DEGRADED)

2019-02-15 21:44:08.996099 mon.torsk1 mon.0 10.194.132.88:6789/0 604377 :

cluster [WRN] Health check failed: Reduced data availability: 12 pgs

peering (PG_AVAILABILITY)

2019-02-15 21:44:13.498472 mon.torsk1 mon.0 10.194.132.88:6789/0 604378 :

cluster [WRN] Health check update: Degraded data redundancy: 55/700357161

objects degraded (0.000%), 33 pgs degraded (PG_DEGRADED)

2019-02-15 21:44:15.081437 mon.torsk1 mon.0 10.194.132.88:6789/0 604379 :

cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data

availability: 12 pgs peering)

2019-02-15 21:44:18.498808 mon.torsk1 mon.0 10.194.132.88:6789/0 604380 :

cluster [WRN] Health check update: Degraded data redundancy: 14/700357230

objects degraded (0.000%), 9 pgs degraded (PG_DEGRADED)

2019-02-15 21:44:19.132797 mon.torsk1 mon.0 10.194.132.88:6789/0 604381 :

cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data

redundancy: 14/700357230 objects degraded (0.000%), 9 pgs degraded)

2019-02-15 21:44:19.132824 mon.torsk1 mon.0 10.194.132.88:6789/0 604382 :

cluster [INF] Cluster is now healthy

2019-02-15 22:00:00.000117 mon.torsk1 mon.0 10.194.132.88:6789/0 604402 :

cluster [INF] overall HEALTH_OK

Why do I end up with a PG_AVAILABILITY warning with just one OSD down. We

have 3x replicated pools and 4+2 EC pools in the system?. Or am I just

mis-reading what PG_AVAILABILTY means on the docs... but I got to the

conclusion that "some data is inaccessible"?_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com