Re: OSD Down but not marked down by cluster

Tyler Bishop <tyler.bishop@xxxxxxxxxxxxxxxxx> · Thu, 29 Sep 2016 07:32:40 -0400 (EDT)

The crush does however the status does not.

16/330 in osds are down

When in reality it was 56/330.

I am also having issues of io deadlock from clients until a full rebuild or it comes back up.   I have the priorities set but I believe its still trying to write to the down osds.

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 

Tyler.Bishop@xxxxxxxxxxxxxxxxx 

If you are not the intended recipient of this transmission you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

----- Original Message -----
From: "Wido den Hollander" <wido@xxxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxx>, "ceph new" <ceph-users@xxxxxxxxxxxxxx>, "Tyler Bishop" <tyler.bishop@xxxxxxxxxxxxxxxxx>
Sent: Thursday, September 29, 2016 3:35:14 AM
Subject: Re:  OSD Down but not marked down by cluster

> Op 29 september 2016 om 1:57 schreef Tyler Bishop <tyler.bishop@xxxxxxxxxxxxxxxxx>:
> 
> 
> S1148 is down but the cluster does not mark it as such. 
> 

A host will never be marked as down, but the output shows that all OSDs are marked as down however.

Wido

> cluster 3aac8ab8-1011-43d6-b281-d16e7a61b2bd 
> health HEALTH_WARN 
> 3888 pgs backfill 
> 196 pgs backfilling 
> 6418 pgs degraded 
> 52 pgs down 
> 52 pgs peering 
> 1 pgs recovery_wait 
> 3653 pgs stuck degraded 
> 52 pgs stuck inactive 
> 6088 pgs stuck unclean 
> 3653 pgs stuck undersized 
> 6417 pgs undersized 
> 186 requests are blocked > 32 sec 
> recovery 42096983/185765821 objects degraded (22.661%) 
> recovery 49940341/185765821 objects misplaced (26.883%) 
> 16/330 in osds are down 
> monmap e1: 3 mons at {ceph0-mon0=10.1.8.40:6789/0,ceph0-mon1=10.1.8.41:6789/0,ceph0-mon2=10.1.8.42:6789/0} 
> election epoch 13550, quorum 0,1,2 ceph0-mon0,ceph0-mon1,ceph0-mon2 
> osdmap e236889: 370 osds: 314 up, 330 in; 4096 remapped pgs 
> pgmap v47890297: 20920 pgs, 19 pools, 316 TB data, 85208 kobjects 
> 530 TB used, 594 TB / 1125 TB avail 
> 42096983/185765821 objects degraded (22.661%) 
> 49940341/185765821 objects misplaced (26.883%) 
> 14390 active+clean 
> 3846 active+undersized+degraded+remapped+wait_backfill 
> 2375 active+undersized+degraded 
> 196 active+undersized+degraded+remapped+backfilling 
> 52 down+peering 
> 42 active+remapped+wait_backfill 
> 11 active+remapped 
> 7 active+clean+scrubbing+deep 
> 1 active+recovery_wait+degraded+remapped 
> recovery io 2408 MB/s, 623 objects/s 
> 
> 
> -43 304.63928 host ceph0-s1148 
> 303 5.43999 osd.303 down 0 1.00000 
> 304 5.43999 osd.304 down 0 1.00000 
> 305 5.43999 osd.305 down 0 1.00000 
> 306 5.43999 osd.306 down 0 1.00000 
> 307 5.43999 osd.307 down 0 1.00000 
> 308 5.43999 osd.308 down 0 1.00000 
> 309 5.43999 osd.309 down 0 1.00000 
> 310 5.43999 osd.310 down 0 1.00000 
> 311 5.43999 osd.311 down 0 1.00000 
> 312 5.43999 osd.312 down 0 1.00000 
> 313 5.43999 osd.313 down 0 1.00000 
> 314 5.43999 osd.314 down 0 1.00000 
> 315 5.43999 osd.315 down 0 1.00000 
> 316 5.43999 osd.316 down 0 1.00000 
> 317 5.43999 osd.317 down 0 1.00000 
> 318 5.43999 osd.318 down 0 1.00000 
> 319 5.43999 osd.319 down 0 1.00000 
> 320 5.43999 osd.320 down 0 1.00000 
> 321 5.43999 osd.321 down 0 1.00000 
> 322 5.43999 osd.322 down 0 1.00000 
> 323 5.43999 osd.323 down 0 1.00000 
> 324 5.43999 osd.324 down 0 1.00000 
> 325 5.43999 osd.325 down 0 1.00000 
> 326 5.43999 osd.326 down 0 1.00000 
> 327 5.43999 osd.327 down 0 1.00000 
> 328 5.43999 osd.328 down 0 1.00000 
> 329 5.43999 osd.329 down 0 1.00000 
> 330 5.43999 osd.330 down 0 1.00000 
> 331 5.43999 osd.331 down 0 1.00000 
> 332 5.43999 osd.332 down 1.00000 1.00000 
> 333 5.43999 osd.333 down 1.00000 1.00000 
> 334 5.43999 osd.334 down 1.00000 1.00000 
> 335 5.43999 osd.335 down 0 1.00000 
> 337 5.43999 osd.337 down 1.00000 1.00000 
> 338 5.43999 osd.338 down 0 1.00000 
> 339 5.43999 osd.339 down 1.00000 1.00000 
> 340 5.43999 osd.340 down 0 1.00000 
> 341 5.43999 osd.341 down 0 1.00000 
> 342 5.43999 osd.342 down 0 1.00000 
> 343 5.43999 osd.343 down 0 1.00000 
> 344 5.43999 osd.344 down 0 1.00000 
> 345 5.43999 osd.345 down 0 1.00000 
> 346 5.43999 osd.346 down 0 1.00000 
> 347 5.43999 osd.347 down 1.00000 1.00000 
> 348 5.43999 osd.348 down 1.00000 1.00000 
> 349 5.43999 osd.349 down 0 1.00000 
> 350 5.43999 osd.350 down 1.00000 1.00000 
> 351 5.43999 osd.351 down 1.00000 1.00000 
> 352 5.43999 osd.352 down 1.00000 1.00000 
> 353 5.43999 osd.353 down 1.00000 1.00000 
> 354 5.43999 osd.354 down 1.00000 1.00000 
> 355 5.43999 osd.355 down 1.00000 1.00000 
> 356 5.43999 osd.356 down 1.00000 1.00000 
> 357 5.43999 osd.357 down 1.00000 1.00000 
> 358 5.43999 osd.358 down 0 1.00000 
> 369 5.43999 osd.369 down 1.00000 1.00000 
> 
> 
> 
> 
> 
> 	
> 
> Tyler Bishop 
> Chief Technical Officer 
> 513-299-7108 x10 
> 
> 
> 
> Tyler.Bishop@xxxxxxxxxxxxxxxxx 
> 
> 
> If you are not the intended recipient of this transmission you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com