ceph osd out trigerred the pg recovery process, but by the end, why not all pgs are active+clean?

Cory <corygu@xxxxxxx> · Tue, 16 Jun 2015 11:53:16 +0800 (CST)

Hi ceph experts,
I did some test on my ceph cluster recently with following steps:
1. at the beginning, all pgs are active+clean;
2. stop a osd. I observed a lot of pgs are degraded.
3. ceph osd out.
4. then I observed ceph is doing recovery process.

my question is  I expected by the end, all pgs  will go back to active+clean since the osd is out of cluster. but why there are still some pgs are in degraded status and the recovery process seemed stopped for ever.

here is the ceph -s output status.
ceph -s
    cluster fdecc391-0c75-417d-a980-57ef52bdc1cd
     health HEALTH_ERR 98 pgs degraded; 16 pgs inconsistent; 104 pgs stuck unclean; recovery 58170/14559528 objects degraded (0.400%); 18 scrub errors; clock skew detected on
 mon.ceph10     monmap e7: 3 mons at {ceph01=10.195.158.199:6789/0,ceph06=10.195.158.204:6789/0,ceph10=10.195.158.208:6789/0}, election epoch 236, quorum 0,1,2 ceph01,ceph06,ceph10
     osdmap e14456: 81 osds: 80 up, 80 in
      pgmap v1589483: 8320 pgs, 3 pools, 32375 GB data, 4739 kobjects
            83877 GB used, 122 TB / 214 TB avail
            58170/14559528 objects degraded (0.400%)
                8200 active+clean
                  16 active+clean+inconsistent
                  98 active+degraded
                   6 active+remapped
  client io 63614 kB/s rd, 41458 kB/s wr, 2375 op/s

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com