Re: Help Ceph Cluster Down

Arun POONIA <arun.poonia@xxxxxxxxxxxxxxxxx> · Thu, 3 Jan 2019 19:27:48 -0800

Hi Chris, 
Indeed that's what happened. I didn't set noout flag either and I did zapped disk on new server every time. In my cluster status fre201 is only new server. 

Current Status after enabling 3 OSDs on fre201 host. 

[root@fre201 ~]# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
 -1       70.92137 root default
 -2        5.45549     host fre101
  0   hdd  1.81850         osd.0       up  1.00000 1.00000
  1   hdd  1.81850         osd.1       up  1.00000 1.00000
  2   hdd  1.81850         osd.2       up  1.00000 1.00000
 -9        5.45549     host fre103
  3   hdd  1.81850         osd.3       up  1.00000 1.00000
  4   hdd  1.81850         osd.4       up  1.00000 1.00000
  5   hdd  1.81850         osd.5       up  1.00000 1.00000
 -3        5.45549     host fre105
  6   hdd  1.81850         osd.6       up  1.00000 1.00000
  7   hdd  1.81850         osd.7       up  1.00000 1.00000
  8   hdd  1.81850         osd.8       up  1.00000 1.00000
 -4        5.45549     host fre107
  9   hdd  1.81850         osd.9       up  1.00000 1.00000
 10   hdd  1.81850         osd.10      up  1.00000 1.00000
 11   hdd  1.81850         osd.11      up  1.00000 1.00000
 -5        5.45549     host fre109
 12   hdd  1.81850         osd.12      up  1.00000 1.00000
 13   hdd  1.81850         osd.13      up  1.00000 1.00000
 14   hdd  1.81850         osd.14      up  1.00000 1.00000
 -6        5.45549     host fre111
 15   hdd  1.81850         osd.15      up  1.00000 1.00000
 16   hdd  1.81850         osd.16      up  1.00000 1.00000
 17   hdd  1.81850         osd.17      up  0.79999 1.00000
 -7        5.45549     host fre113
 18   hdd  1.81850         osd.18      up  1.00000 1.00000
 19   hdd  1.81850         osd.19      up  1.00000 1.00000
 20   hdd  1.81850         osd.20      up  1.00000 1.00000
 -8        5.45549     host fre115
 21   hdd  1.81850         osd.21      up  1.00000 1.00000
 22   hdd  1.81850         osd.22      up  1.00000 1.00000
 23   hdd  1.81850         osd.23      up  1.00000 1.00000
-10        5.45549     host fre117
 24   hdd  1.81850         osd.24      up  1.00000 1.00000
 25   hdd  1.81850         osd.25      up  1.00000 1.00000
 26   hdd  1.81850         osd.26      up  1.00000 1.00000
-11        5.45549     host fre119
 27   hdd  1.81850         osd.27      up  1.00000 1.00000
 28   hdd  1.81850         osd.28      up  1.00000 1.00000
 29   hdd  1.81850         osd.29      up  1.00000 1.00000
-12        5.45549     host fre121
 30   hdd  1.81850         osd.30      up  1.00000 1.00000
 31   hdd  1.81850         osd.31      up  1.00000 1.00000
 32   hdd  1.81850         osd.32      up  1.00000 1.00000
-13        5.45549     host fre123
 33   hdd  1.81850         osd.33      up  1.00000 1.00000
 34   hdd  1.81850         osd.34      up  1.00000 1.00000
 35   hdd  1.81850         osd.35      up  1.00000 1.00000
-27        5.45549     host fre201
 36   hdd  1.81850         osd.36      up  1.00000 1.00000
 37   hdd  1.81850         osd.37      up  1.00000 1.00000
 38   hdd  1.81850         osd.38      up  1.00000 1.00000
[root@fre201 ~]#
[root@fre201 ~]#
[root@fre201 ~]#
[root@fre201 ~]#
[root@fre201 ~]#
[root@fre201 ~]# ceph -s
  cluster:
    id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
    health: HEALTH_ERR
            3 pools have many more objects per pg than average
            585791/12391450 objects misplaced (4.727%)
            2 scrub errors
            2374 PGs pending on creation
            Reduced data availability: 6578 pgs inactive, 2025 pgs down, 74 pgs peering, 1234 pgs stale
            Possible data damage: 2 pgs inconsistent
            Degraded data redundancy: 64969/12391450 objects degraded (0.524%), 616 pgs degraded, 20 pgs undersized
            96242 slow requests are blocked > 32 sec
            228 stuck requests are blocked > 4096 sec
            too many PGs per OSD (2768 > max 200)

  services:
    mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
    mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
    osd: 39 osds: 39 up, 39 in; 96 remapped pgs
    rgw: 1 daemon active

  data:
    pools:   18 pools, 54656 pgs
    objects: 6050k objects, 10942 GB
    usage:   21900 GB used, 50721 GB / 72622 GB avail
    pgs:     0.002% pgs unknown
             12.050% pgs not active
             64969/12391450 objects degraded (0.524%)
             585791/12391450 objects misplaced (4.727%)
             47489 active+clean
             3670  activating
             1098  stale+down
             923   down
             575   activating+degraded
             563   stale+active+clean
             105   stale+activating
             78    activating+remapped
             72    peering
             25    stale+activating+degraded
             23    stale+activating+remapped
             9     stale+active+undersized
             6     stale+activating+undersized+degraded+remapped
             5     stale+active+undersized+degraded
             4     down+remapped
             4     activating+degraded+remapped
             2     active+clean+inconsistent
             1     stale+activating+degraded+remapped
             1     stale+active+clean+remapped
             1     stale+remapped+peering
             1     remapped+peering
             1     unknown

  io:
    client:   0 B/s rd, 208 kB/s wr, 22 op/s rd, 22 op/s wr

Thanks
Arun

On Thu, Jan 3, 2019 at 7:19 PM Chris <bitskrieg@xxxxxxxxxxxxx> wrote:

If you added OSDs and then deleted them repeatedly without waiting for replication to finish as the cluster attempted to re-balance across them, its highly likely that you are permanently missing PGs (especially if the disks were zapped each time).

If those 3 down OSDs can be revived there is a (small) chance that you can right the ship, but 1400pg/OSD is pretty extreme.  I'm surprised the cluster even let you do that - this sounds like a data loss event.

Bring back the 3 OSD and see what those 2 inconsistent pgs look like with ceph pg query.

On January 3, 2019 21:59:38 Arun POONIA <arun.poonia@xxxxxxxxxxxxxxxxx> wrote:

Hi, 

Recently I tried adding a new node (OSD) to ceph cluster using ceph-deploy tool. Since I was experimenting with tool and ended up deleting OSD nodes on new server couple of times. 

Now since ceph OSDs are running on new server cluster PGs seems to be inactive (10-15%) and they are not recovering or rebalancing. Not sure what to do. I tried shutting down OSDs on new server. 

Status: 
[root@fre105 ~]# ceph -s
2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2) No such file or directory
  cluster:
    id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
    health: HEALTH_ERR
            3 pools have many more objects per pg than average
            373907/12391198 objects misplaced (3.018%)
            2 scrub errors
            9677 PGs pending on creation
            Reduced data availability: 7145 pgs inactive, 6228 pgs down, 1 pg peering, 2717 pgs stale
            Possible data damage: 2 pgs inconsistent
            Degraded data redundancy: 178350/12391198 objects degraded (1.439%), 346 pgs degraded, 1297 pgs undersized
            52486 slow requests are blocked > 32 sec
            9287 stuck requests are blocked > 4096 sec
            too many PGs per OSD (2968 > max 200)

  services:
    mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
    mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
    osd: 39 osds: 36 up, 36 in; 51 remapped pgs
    rgw: 1 daemon active

  data:
    pools:   18 pools, 54656 pgs
    objects: 6050k objects, 10941 GB
    usage:   21727 GB used, 45308 GB / 67035 GB avail
    pgs:     13.073% pgs not active
             178350/12391198 objects degraded (1.439%)
             373907/12391198 objects misplaced (3.018%)
             46177 active+clean
             5054  down
             1173  stale+down
             1084  stale+active+undersized
             547   activating
             201   stale+active+undersized+degraded
             158   stale+activating
             96    activating+degraded
             46    stale+active+clean
             42    activating+remapped
             34    stale+activating+degraded
             23    stale+activating+remapped
             6     stale+activating+undersized+degraded+remapped
             6     activating+undersized+degraded+remapped
             2     activating+degraded+remapped
             2     active+clean+inconsistent
             1     stale+activating+degraded+remapped
             1     stale+active+clean+remapped
             1     stale+remapped
             1     down+remapped
             1     remapped+peering

  io:
    client:   0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr

Thanks
-- 
Arun Poonia

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Arun Poonia

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com