Re: why are there "degraded" PGs when adding OSDs?

Samuel Just <sjust@xxxxxxxxxx> · Mon, 27 Jul 2015 16:06:47 -0400 (EDT)

Hmm, that's odd.  Can you attach the osdmap and ceph pg dump prior to the addition (with all pgs active+clean), then the osdmap and ceph pg dump afterwards?
-Sam

----- Original Message -----
From: "Chad William Seys" <cwseys@xxxxxxxxxxxxxxxx>
To: "Samuel Just" <sjust@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxx>
Sent: Monday, July 27, 2015 12:57:23 PM
Subject: Re:  why are there "degraded" PGs when adding OSDs?

Hi Sam,

> The pg might also be degraded right after a map change which changes the
> up/acting sets since the few objects updated right before the map change
> might be new on some replicas and old on the other replicas.  While in that
> state, those specific objects are degraded, and the pg would report
> degraded until they are recovered (which would happen asap, prior to
> backfilling the new replica). -Sam

That sounds like only a few PGs should be degraded.  I instead have about 45% 
(and higher earlier).

# ceph -s
    cluster 7797e50e-f4b3-42f6-8454-2e2b19fa41d6
     health HEALTH_WARN
            2081 pgs backfill
            6745 pgs degraded
            17 pgs recovering
            6728 pgs recovery_wait
            6745 pgs stuck degraded
            8826 pgs stuck unclean
            recovery 2530124/5557452 objects degraded (45.527%)
            recovery 33594/5557452 objects misplaced (0.604%)
     monmap e5: 3 mons at 
{mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03=10.128.198.51:6789/0}
            election epoch 16458, quorum 0,1,2 mon03,mon01,mon02
     mdsmap e3032: 1/1/1 up {0=mds01.hep.wisc.edu=up:active}
     osdmap e149761: 27 osds: 27 up, 27 in; 2083 remapped pgs
      pgmap v13464928: 18432 pgs, 9 pools, 5401 GB data, 1364 kobjects
            11122 GB used, 11786 GB / 22908 GB avail
            2530124/5557452 objects degraded (45.527%)
            33594/5557452 objects misplaced (0.604%)
                9606 active+clean
                6726 active+recovery_wait+degraded
                2081 active+remapped+wait_backfill
                  17 active+recovering+degraded
                   2 active+recovery_wait+degraded+remapped
recovery io 24861 kB/s, 6 objects/s

Chad.

> 
> ----- Original Message -----
> From: "Chad William Seys" <cwseys@xxxxxxxxxxxxxxxx>
> To: "ceph-users" <ceph-users@xxxxxxxx>
> Sent: Monday, July 27, 2015 12:27:26 PM
> Subject:  why are there "degraded" PGs when adding OSDs?
> 
> Hi All,
> 
> I recently added some OSDs to the Ceph cluster (0.94.2). I noticed that
> 'ceph -s' reported both misplaced AND degraded PGs.
> 
> Why should any PGs become degraded?  Seems as though Ceph should only be
> reporting misplaced PGs?
> 
> From the Giant release notes:
> Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and related
> commands now make a distinction between data that is degraded (there are
> fewer than the desired number of copies) and data that is misplaced (stored
> in the wrong location in the cluster). The distinction is important because
> the latter does not compromise data safety.
> 
> Does Ceph delete some replicas of the PGs (leading to degradation) before
> re- replicating on the new OSD?
> 
> This does not seem to be the safest algorithm.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com