Re: PG's Degraded on disk failure not remapped.

Christian Balzer <chibi@xxxxxxx> · Tue, 4 Aug 2015 14:46:42 +0900

Hello,

There's a number of reasons I can think of why this would happen.
You say "default behavior" but looking at your map it's obvious that you
probably don't have a default cluster and crush map.
Your ceph.conf may help, too.

Regards, 

Christian
On Tue, 4 Aug 2015 13:05:54 +1000 Daniel Manzau wrote:

> Hi Cephers,
> 
> We've been testing drive failures and we're just trying to see if the
> behaviour of our cluster is normal,  or if we've setup something wrong.
> 
> In summary; the OSD is down and out, but the PGs are showing as degraded
> and don't seem to want to remap. We'd have assumed once the OSD was
> marked out, that a re-map should have happened and we'd see misplaced
> rather than degraded PGs.
> 
>   cluster bfb7e824-f37d-45c0-a4fc-a98182fed985
>      health HEALTH_WARN
>             43 pgs degraded
>             43 pgs stuck degraded
>             44 pgs stuck unclean
>             43 pgs stuck undersized
>             43 pgs undersized
>             recovery 36899/6822836 objects degraded (0.541%)
>             recovery 813/6822836 objects misplaced (0.012%)
>      monmap e3: 3 mons at
> {ceph-admin1=10.66.8.1:6789/0,ceph-store1=10.66.8.2:6789/0,ceph-store2=10.
> 66.8.3:6789/0}
>             election epoch 950, quorum 0,1,2
> ceph-admin1,ceph-store1,ceph-store2
>      osdmap e6342: 36 osds: 35 up, 35 in; 1 remapped pgs
>       pgmap v11805515: 1700 pgs, 3 pools, 13165 GB data, 3331 kobjects
>             25941 GB used, 30044 GB / 55986 GB avail
>             36899/6822836 objects degraded (0.541%)
>             813/6822836 objects misplaced (0.012%)
>                 1656 active+clean
>                   43 active+undersized+degraded
>                    1 active+remapped
>   client io 491 kB/s rd, 3998 kB/s wr, 480 op/s
> 
> 
> # id	weight	type name	up/down	reweight
> -6	43.56	root hdd
> -2	21.78		host ceph-store1-hdd
> 0	3.63			osd.0	up	1
> 2	3.63			osd.2	up	1
> 4	3.63			osd.4	up	1
> 6	3.63			osd.6	up	1
> 8	3.63			osd.8	up	1
> 10	3.63			osd.10	up	1
> -3	21.78		host ceph-store2-hdd
> 1	3.63			osd.1	up	1
> 3	3.63			osd.3	up	1
> 5	3.63			osd.5	up	1
> 7	3.63			osd.7	up	1
> 9	3.63			osd.9	up	1
> 11	3.63			osd.11	up	1
> -1	11.48	root ssd
> -4	5.74		host ceph-store1-ssd
> 12	0.43			osd.12	up	1
> 13	0.43			osd.13	up	1
> 14	0.43			osd.14	up	1
> 16	0.43			osd.16	up	1
> 18	0.43			osd.18	down	0
> 19	0.43			osd.19	up	1
> 20	0.43			osd.20	up	1
> 21	0.43			osd.21	up	1
> 32	0.72			osd.32	up	1
> 33	0.72			osd.33	up	1
> 17	0.43			osd.17	up	1
> 15	0.43			osd.15	up	1
> -5	5.74		host ceph-store2-ssd
> 22	0.43			osd.22	up	1
> 23	0.43			osd.23	up	1
> 24	0.43			osd.24	up	1
> 25	0.43			osd.25	up	1
> 26	0.43			osd.26	up	1
> 27	0.43			osd.27	up	1
> 28	0.43			osd.28	up	1
> 29	0.43			osd.29	up	1
> 30	0.43			osd.30	up	1
> 31	0.43			osd.31	up	1
> 34	0.72			osd.34	up	1
> 35	0.72			osd.35	up	1
> 
> Are we misunderstanding the default behaviour? Any help you can provide
> will be very much appreciated.
> 
> Regards,
> Daniel
> 
> W: www.3ca.com.au
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com