Hi Cephers, I am running a very small cluster of 3 storage and 2 monitor nodes. After I kill 1 osd-daemon, the cluster never recovers fully. 9 PGs remain undersized for unknown reason. After I restart that 1 osd deamon, the cluster recovers in no time . Size of all pools are 3 and min_size is 2. Can anybody please help ? Output of "ceph -s" cluster fac04d85-db48-4564-b821-deebda046261 health HEALTH_WARN 9 pgs degraded 9 pgs stuck degraded 9 pgs stuck unclean 9 pgs stuck undersized 9 pgs undersized recovery 3327/195138 objects degraded (1.705%) pool .users pg_num 512 > pgp_num 8 monmap e2: 2 mons at {dssmon2=10.140.13.13:6789/0,dssmonleader1=10.140.13.11:6789/0} election epoch 1038, quorum 0,1 dssmonleader1,dssmon2 osdmap e857: 69 osds: 68 up, 68 in pgmap v106601: 896 pgs, 9 pools, 435 MB data, 65047 objects 279 GB used, 247 TB / 247 TB avail 3327/195138 objects degraded (1.705%) 887 active+clean 9 active+undersized+degraded client io 395 B/s rd, 0 B/s wr, 0 op/s ceph health detail output : HEALTH_WARN 9 pgs degraded; 9 pgs stuck degraded; 9 pgs stuck unclean; 9 pgs stuck undersized; 9 pgs undersized; recovery 3327/195138 objects degraded (1.705%); pool .users pg_num 512 > pgp_num 8 pg 7.a is stuck unclean for 322742.938959, current state active+undersized+degraded, last acting [38,2] pg 5.27 is stuck unclean for 322754.823455, current state active+undersized+degraded, last acting [26,19] pg 5.32 is stuck unclean for 322750.685684, current state active+undersized+degraded, last acting [39,19] pg 6.13 is stuck unclean for 322732.665345, current state active+undersized+degraded, last acting [30,16] pg 5.4e is stuck unclean for 331869.103538, current state active+undersized+degraded, last acting [16,38] pg 5.72 is stuck unclean for 331871.208948, current state active+undersized+degraded, last acting [16,49] pg 4.17 is stuck unclean for 331822.771240, current state active+undersized+degraded, last acting [47,20] pg 5.2c is stuck unclean for 323021.274535, current state active+undersized+degraded, last acting [47,18] pg 5.37 is stuck unclean for 323007.574395, current state active+undersized+degraded, last acting [43,1] pg 7.a is stuck undersized for 322487.284302, current state active+undersized+degraded, last acting [38,2] pg 5.27 is stuck undersized for 322487.287164, current state active+undersized+degraded, last acting [26,19] pg 5.32 is stuck undersized for 322487.285566, current state active+undersized+degraded, last acting [39,19] pg 6.13 is stuck undersized for 322487.287168, current state active+undersized+degraded, last acting [30,16] pg 5.4e is stuck undersized for 331351.476170, current state active+undersized+degraded, last acting [16,38] pg 5.72 is stuck undersized for 331351.475707, current state active+undersized+degraded, last acting [16,49] pg 4.17 is stuck undersized for 322487.280309, current state active+undersized+degraded, last acting [47,20] pg 5.2c is stuck undersized for 322487.286347, current state active+undersized+degraded, last acting [47,18] pg 5.37 is stuck undersized for 322487.280027, current state active+undersized+degraded, last acting [43,1] pg 7.a is stuck degraded for 322487.284340, current state active+undersized+degraded, last acting [38,2] pg 5.27 is stuck degraded for 322487.287202, current state active+undersized+degraded, last acting [26,19] pg 5.32 is stuck degraded for 322487.285604, current state active+undersized+degraded, last acting [39,19] pg 6.13 is stuck degraded for 322487.287207, current state active+undersized+degraded, last acting [30,16] pg 5.4e is stuck degraded for 331351.476209, current state active+undersized+degraded, last acting [16,38] pg 5.72 is stuck degraded for 331351.475746, current state active+undersized+degraded, last acting [16,49] pg 4.17 is stuck degraded for 322487.280348, current state active+undersized+degraded, last acting [47,20] pg 5.2c is stuck degraded for 322487.286386, current state active+undersized+degraded, last acting [47,18] pg 5.37 is stuck degraded for 322487.280066, current state active+undersized+degraded, last acting [43,1] pg 5.72 is active+undersized+degraded, acting [16,49] pg 5.4e is active+undersized+degraded, acting [16,38] pg 5.32 is active+undersized+degraded, acting [39,19] pg 5.37 is active+undersized+degraded, acting [43,1] pg 5.2c is active+undersized+degraded, acting [47,18] pg 5.27 is active+undersized+degraded, acting [26,19] pg 6.13 is active+undersized+degraded, acting [30,16] pg 4.17 is active+undersized+degraded, acting [47,20] pg 7.a is active+undersized+degraded, acting [38,2] recovery 3327/195138 objects degraded (1.705%) pool .users pg_num 512 > pgp_num 8 My crush map is default. Ceph.conf is : [osd] osd mkfs type=xfs osd recovery threads=2 osd disk thread ioprio class=idle osd disk thread ioprio priority=7 osd journal=/var/lib/ceph/osd/ceph-$id/journal filestore flusher=False osd op num shards=3 debug osd=5 osd disk threads=2 osd data=/var/lib/ceph/osd/ceph-$id osd op num threads per shard=5 osd op threads=4 keyring=/var/lib/ceph/osd/ceph-$id/keyring osd journal size=4096 [global] filestore max sync interval=10 auth cluster required=cephx osd pool default min size=3 osd pool default size=3 public network=10.140.13.0/26 objecter inflight op_bytes=1073741824 auth service required=cephx filestore min sync interval=1 fsid=fac04d85-db48-4564-b821-deebda046261 keyring=/etc/ceph/keyring cluster network=10.140.13.0/26 auth client required=cephx filestore xattr use omap=True max open files=65536 objecter inflight ops=2048 osd pool default pg num=512 log to syslog = true #err to syslog = true Thanks & Regards Gaurav Bafna 9540631400 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com