Reweighting causes whole cluster to peer/activate

Kevin Hrpcek <kevin.hrpcek@xxxxxxxxxxxxx> · Thu, 14 Jun 2018 13:49:36 -0500



    Hello,

      
      I'm seeing something that seems to be odd behavior when
      reweighting OSDs. I've just upgraded to 12.2.5 and am adding in a
      new osd server to the cluster. I gradually weight the 10TB OSDs
      into the cluster by doing a +1, letting things backfill for a
      while, then +1 until I reach my desired weight. This hasn't been a
      problem in the past, a proportionate amount of PGs would get
      remapped, peer and activate across this cluster. Now on 12.2.5
      when I do this, almost all PGs peer and reactivate. Sometimes it
      recovers within a minute, other times it takes longer, this last
      time actually saw some OSDs on the new node crash and caused a
      longer time for peering/activating. Regardless of recovery time,
      this is a fairly violent reaction to reweighting.

      
      Has anyone else seen behavior like this or have any ideas what's
      going on? 

      
      For example...

      
      [root@sephmon1 ~]# ceph -s

        cluster:

          id:     bc2a1488-74f8-4d87-b2f6-615ae26bf7c9

          health: HEALTH_OK

       
        services:

          mon: 3 daemons, quorum sephmon1,sephmon2,sephmon3

          mgr: sephmon2(active), standbys: sephmon1, sephmon3

          mds: cephfs1-1/1/1 up  {0=sephmon1=up:active}, 1 up:standby

          osd: 789 osds: 789 up, 789 in

       
        data:

          pools:   7 pools, 39168 pgs

          objects: 74046k objects, 2989 TB

          usage:   3756 TB used, 1517 TB / 5273 TB avail

          pgs:     39137 active+clean

                   26    active+clean+scrubbing+deep

                   5     active+clean+scrubbing

       
        io:

          client:   3522 MB/s rd, 118 MB/s wr, 1295 op/s rd, 833 op/s wr

       
      [root@sephmon1 ~]# for i in {771..779}; do ceph osd crush reweight
      osd.${i} 6.5; done

      reweighted item id 771 name 'osd.771' to 6.5 in crush map

      reweighted item id 772 name 'osd.772' to 6.5 in crush map

      reweighted item id 773 name 'osd.773' to 6.5 in crush map

      reweighted item id 774 name 'osd.774' to 6.5 in crush map

      reweighted item id 775 name 'osd.775' to 6.5 in crush map

      reweighted item id 776 name 'osd.776' to 6.5 in crush map

      reweighted item id 777 name 'osd.777' to 6.5 in crush map

      reweighted item id 778 name 'osd.778' to 6.5 in crush map

      reweighted item id 779 name 'osd.779' to 6.5 in crush map

      [root@sephmon1 ~]# ceph -s

        cluster:

          id:     bc2a1488-74f8-4d87-b2f6-615ae26bf7c9

          health: HEALTH_WARN

                  2 osds down

                  78219/355096089 objects misplaced (0.022%)

                  Reduced data availability: 668 pgs inactive, 1920 pgs
      down, 551 pgs peering, 29 pgs incomplete

                  Degraded data redundancy: 803425/355096089 objects
      degraded (0.226%), 204 pgs degraded

                  3 slow requests are blocked > 32 sec

       
        services:

          mon: 3 daemons, quorum sephmon1,sephmon2,sephmon3

          mgr: sephmon2(active), standbys: sephmon1, sephmon3

          mds: cephfs1-1/1/1 up  {0=sephmon1=up:active}, 1 up:standby

          osd: 789 osds: 787 up, 789 in; 257 remapped pgs

       
        data:

          pools:   7 pools, 39168 pgs

          objects: 73964k objects, 2985 TB

          usage:   3756 TB used, 1517 TB / 5273 TB avail

          pgs:     0.028% pgs unknown

                   94.904% pgs not active

                   803425/355096089 objects degraded (0.226%)

                   78219/355096089 objects misplaced (0.022%)

                   20215 peering

                   14335 activating

                   1882  active+clean

                   1788  down

                   205   remapped+peering

                   167   stale+peering

                   142   activating+undersized+degraded

                   127   activating+undersized

                   126   stale+down

                   57    active+undersized+degraded

                   39    stale+active+clean

                   27    incomplete

                   17    activating+remapped

                   11    unknown

                   7     stale+activating

                   6     down+remapped

                   3     stale+activating+undersized

                   2     stale+incomplete

                   2     active+undersized

                   2     stale+activating+undersized+degraded

                   1     activating+undersized+degraded+remapped

                   1     stale+active+undersized+degraded

                   1     remapped

                   1     active+clean+scrubbing

                   1    
      active+undersized+degraded+remapped+backfill_wait

                   1     stale+remapped+peering

                   1     active+clean+remapped

                   1     active+remapped+backfilling

       
        io:

          client:   3896 GB/s rd, 339 GB/s wr, 8004 kop/s rd, 320 kop/s
      wr

          recovery: 726 GB/s, 11172 objects/s

       
      Thanks,

      Kevin

      
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com