Re: PG's Degraded on disk failure not remapped.

Christian Balzer <chibi@xxxxxxx> · Tue, 4 Aug 2015 20:20:43 +0900

Hello,

On Tue, 4 Aug 2015 20:33:58 +1000 Daniel Manzau wrote:

> Hi Christian,
> 
> True it's not exactly out of the box. Here is the ceph.conf.
> 
Crush rule file and a description (are those 4 hosts or are the HDD and
SSD shared on the same HW as your pool size suggests), etc etc.
My guess is you're following Sebastien's blog entry on how to mix things
on the same host.

> Could it be the " osd crush update on start = false" stopping the
> remapping of a disk on failure?
> 
Doubt it, that would be a pretty significant bug. 

OTOH, is your "osd_pool_default_size = 2" matched by a "osd pool default
min size = 1" ?

As in, is your cluster (or at least the pool using SSDs) totally stuck at
this point?

Christian
> 
> [global]
> fsid = bfb7e666-f66d-45c0-b4fc-b98182fed666
> mon_initial_members = ceph-store1, ceph-store2, ceph-admin1
> mon_host = 10.66.8.2,10.66.8.3,10.66.8.1
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_pool_default_size = 2
> public_network = 10.66.8.0/23
> cluster network = 10.66.16.0/23
> 
> [osd]
> osd crush update on start = false
> osd_max_backfills = 2
> osd_recovery_op_priority = 2
> osd_recovery_max_active = 2
> osd_recovery_max_chunk = 4194304
> 
> [client]
> rbd cache = true
> rbd cache writethrough until flush = true
> admin socket = /var/run/ceph/rbd-client-$pid.asok
> 
> 
> Regards,
> Daniel
> 
> -----Original Message-----
> From: Christian Balzer [mailto:chibi@xxxxxxx]
> Sent: Tuesday, 4 August 2015 3:47 PM
> To: Daniel Manzau
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  PG's Degraded on disk failure not remapped.
> 
> 
> Hello,
> 
> There's a number of reasons I can think of why this would happen.
> You say "default behavior" but looking at your map it's obvious that you
> probably don't have a default cluster and crush map.
> Your ceph.conf may help, too.
> 
> Regards,
> 
> Christian
> On Tue, 4 Aug 2015 13:05:54 +1000 Daniel Manzau wrote:
> 
> > Hi Cephers,
> >
> > We've been testing drive failures and we're just trying to see if the
> > behaviour of our cluster is normal,  or if we've setup something wrong.
> >
> > In summary; the OSD is down and out, but the PGs are showing as
> > degraded and don't seem to want to remap. We'd have assumed once the
> > OSD was marked out, that a re-map should have happened and we'd see
> > misplaced rather than degraded PGs.
> >
> >   cluster bfb7e824-f37d-45c0-a4fc-a98182fed985
> >      health HEALTH_WARN
> >             43 pgs degraded
> >             43 pgs stuck degraded
> >             44 pgs stuck unclean
> >             43 pgs stuck undersized
> >             43 pgs undersized
> >             recovery 36899/6822836 objects degraded (0.541%)
> >             recovery 813/6822836 objects misplaced (0.012%)
> >      monmap e3: 3 mons at
> >
> {ceph-admin1=10.66.8.1:6789/0,ceph-store1=10.66.8.2:6789/0,ceph-store2=10.
> > 66.8.3:6789/0}
> >             election epoch 950, quorum 0,1,2
> > ceph-admin1,ceph-store1,ceph-store2
> >      osdmap e6342: 36 osds: 35 up, 35 in; 1 remapped pgs
> >       pgmap v11805515: 1700 pgs, 3 pools, 13165 GB data, 3331 kobjects
> >             25941 GB used, 30044 GB / 55986 GB avail
> >             36899/6822836 objects degraded (0.541%)
> >             813/6822836 objects misplaced (0.012%)
> >                 1656 active+clean
> >                   43 active+undersized+degraded
> >                    1 active+remapped
> >   client io 491 kB/s rd, 3998 kB/s wr, 480 op/s
> >
> >
> > # id	weight	type name	up/down	reweight
> > -6	43.56	root hdd
> > -2	21.78		host ceph-store1-hdd
> > 0	3.63			osd.0	up	1
> > 2	3.63			osd.2	up	1
> > 4	3.63			osd.4	up	1
> > 6	3.63			osd.6	up	1
> > 8	3.63			osd.8	up	1
> > 10	3.63			osd.10	up	1
> > -3	21.78		host ceph-store2-hdd
> > 1	3.63			osd.1	up	1
> > 3	3.63			osd.3	up	1
> > 5	3.63			osd.5	up	1
> > 7	3.63			osd.7	up	1
> > 9	3.63			osd.9	up	1
> > 11	3.63			osd.11	up	1
> > -1	11.48	root ssd
> > -4	5.74		host ceph-store1-ssd
> > 12	0.43			osd.12	up	1
> > 13	0.43			osd.13	up	1
> > 14	0.43			osd.14	up	1
> > 16	0.43			osd.16	up	1
> > 18	0.43			osd.18	down	0
> > 19	0.43			osd.19	up	1
> > 20	0.43			osd.20	up	1
> > 21	0.43			osd.21	up	1
> > 32	0.72			osd.32	up	1
> > 33	0.72			osd.33	up	1
> > 17	0.43			osd.17	up	1
> > 15	0.43			osd.15	up	1
> > -5	5.74		host ceph-store2-ssd
> > 22	0.43			osd.22	up	1
> > 23	0.43			osd.23	up	1
> > 24	0.43			osd.24	up	1
> > 25	0.43			osd.25	up	1
> > 26	0.43			osd.26	up	1
> > 27	0.43			osd.27	up	1
> > 28	0.43			osd.28	up	1
> > 29	0.43			osd.29	up	1
> > 30	0.43			osd.30	up	1
> > 31	0.43			osd.31	up	1
> > 34	0.72			osd.34	up	1
> > 35	0.72			osd.35	up	1
> >
> > Are we misunderstanding the default behaviour? Any help you can
> > provide will be very much appreciated.
> >
> > Regards,
> > Daniel
> >
> > W: www.3ca.com.au
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com