Re: Degraded objects while OSD is being added/filled

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 30 Jun 2017 20:38:38 +0000

On Wed, Jun 21, 2017 at 6:57 AM Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:

    Hi cephers,

    I noticed something I don't understand about ceph's behavior when
    adding an OSD.  When I start with a clean cluster (all PG's
    active+clean) and add an OSD (via ceph-deploy for example), the
    crush map gets updated and PGs get reassigned to different OSDs, and
    the new OSD starts getting filled with data.  As the new OSD gets
    filled, I start seeing PGs in degraded states.  Here is an example:

          pgmap v52068792: 42496 pgs, 6 pools, 1305 TB
        data, 390 Mobjects

                  3164 TB used, 781 TB / 3946 TB avail

                  8017/994261437 objects degraded (0.001%)

                  2220581/994261437 objects misplaced (0.223%)

                     42393 active+clean

                        91 active+remapped+wait_backfill

                         9 active+clean+scrubbing+deep

                         1 active+recovery_wait+degraded

                         1 active+clean+scrubbing

                         1 active+remapped+backfilling

    Any ideas why there would be any persistent degradation in the
    cluster while the newly added drive is being filled?  It takes
    perhaps a day or two to fill the drive - and during all this time
    the cluster seems to be running degraded.  As data is written to the
    cluster, the number of degraded objects increases over time.  Once
    the newly added OSD is filled, the cluster comes back to clean
    again.

    Here is the PG that is degraded in this picture:

    7.87c    1    0    2    0    0    4194304    7    7   
    active+recovery_wait+degraded    2017-06-20 14:12:44.119921   
    344610'7    583572:2797    [402,521]    402    [402,521]    402   
    344610'7    2017-06-16 06:04:55.822503    344610'7    2017-06-16
    06:04:55.822503

    The newly added osd here is 521.  Before it got added, this PG had
    two replicas clean, but one got forgotten somehow?

This sounds a bit concerning at first glance. Can you provide some output of exactly what commands you're invoking, and the "ceph -s" output as it changes in response?

I really don't see how adding a new OSD can result in it "forgetting" about existing valid copies — it's definitely not supposed to — so I wonder if there's a collision in how it's deciding to remove old locations.

Are you running with only two copies of your data? It shouldn't matter but there could also be errors resulting in a behavioral difference between two and three copies.
-Greg

    Other remapped PGs have 521 in their "up" set but still have the two
    existing copies in their "acting" set - and no degradation is
    shown.  Examples:

    2.f24    14282    0    16    28564    0    51014850801    3102   
    3102    active+remapped+wait_backfill    2017-06-20
    14:12:42.650308    583553'2033479    583573:2033266    [467,521]   
    467    [467,499]    467    582430'2033337    2017-06-16
    09:08:51.055131    582036'2030837    2017-05-31 20:37:54.831178

    6.2b7d    10499    0    140    20998    0    37242874687    3673   
    3673    active+remapped+wait_backfill    2017-06-20
    14:12:42.070019    583569'165163    583572:342128    [541,37,521]   
    541    [541,37,532]    541    582430'161890    2017-06-18
    09:42:49.148402    582430'161890    2017-06-18 09:42:49.148402

    We are running the latest Jewel patch level everywhere (10.2.7). 
    Any insights would be appreciated.

    Andras

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com