Cluster in ERR status when rebalancing

Simone Lazzaris <simone.lazzaris@xxxxxxx> · Mon, 09 Dec 2019 11:37:39 +0100

Hi all;
Long story short, I have a cluster of 26 OSD in 3 nodes (8+9+9). One of the disk is showing some read error, so I''ve added an OSD in the faulty node (OSD.26) and set the (re)weight of the faulty OSD (OSD.12) to zero.

The cluster is now rebalancing, which is fine, but I have now 2 PG in "backfill_toofull" state, so the cluster health is "ERR":

  cluster:
    id:     9ec27b0f-acfd-40a3-b35d-db301ac5ce8c
    health: HEALTH_ERR
            Degraded data redundancy (low space): 2 pgs backfill_toofull

  services:
    mon: 3 daemons, quorum s1,s2,s3 (age 7d)
    mgr: s1(active, since 7d), standbys: s2, s3
    osd: 27 osds: 27 up (since 2h), 26 in (since 2h); 262 remapped pgs
    rgw: 3 daemons active (s1, s2, s3)

  data:
    pools:   10 pools, 1200 pgs
    objects: 11.72M objects, 37 TiB
    usage:   57 TiB used, 42 TiB / 98 TiB avail
    pgs:     2618510/35167194 objects misplaced (7.446%)
             938 active+clean
             216 active+remapped+backfill_wait
             44  active+remapped+backfilling
             2   active+remapped+backfill_wait+backfill_toofull

  io:
    recovery: 163 MiB/s, 50 objects/s

  progress:
    Rebalancing after osd.12 marked out
      [=====.........................]

As you can see, there is plenty of space and none of my OSD  is in full or near full state:

+----+------+-------+-------+--------+---------+--------+---------+-----------+
| id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| 0  |  s1  | 2415G | 1310G |    0   |     0   |    0   |     0   | exists,up |
| 1  |  s2  | 2009G | 1716G |    0   |     0   |    0   |     0   | exists,up |
| 2  |  s3  | 2183G | 1542G |    0   |     0   |    0   |     0   | exists,up |
| 3  |  s1  | 2680G | 1045G |    0   |     0   |    0   |     0   | exists,up |
| 4  |  s2  | 2063G | 1662G |    0   |     0   |    0   |     0   | exists,up |
| 5  |  s3  | 2269G | 1456G |    0   |     0   |    0   |     0   | exists,up |
| 6  |  s1  | 2523G | 1202G |    0   |     0   |    0   |     0   | exists,up |
| 7  |  s2  | 1973G | 1752G |    0   |     0   |    0   |     0   | exists,up |
| 8  |  s3  | 2007G | 1718G |    0   |     0   |    1   |     0   | exists,up |
| 9  |  s1  | 2485G | 1240G |    0   |     0   |    0   |     0   | exists,up |
| 10 |  s2  | 2385G | 1340G |    0   |     0   |    0   |     0   | exists,up |
| 11 |  s3  | 2079G | 1646G |    0   |     0   |    0   |     0   | exists,up |
| 12 |  s1  | 2272G | 1453G |    0   |     0   |    0   |     0   | exists,up |
| 13 |  s2  | 2381G | 1344G |    0   |     0   |    0   |     0   | exists,up |
| 14 |  s3  | 1923G | 1802G |    0   |     0   |    0   |     0   | exists,up |
| 15 |  s1  | 2617G | 1108G |    0   |     0   |    0   |     0   | exists,up |
| 16 |  s2  | 2099G | 1626G |    0   |     0   |    0   |     0   | exists,up |
| 17 |  s3  | 2336G | 1389G |    0   |     0   |    0   |     0   | exists,up |
| 18 |  s1  | 2435G | 1290G |    0   |     0   |    0   |     0   | exists,up |
| 19 |  s2  | 2198G | 1527G |    0   |     0   |    0   |     0   | exists,up |
| 20 |  s3  | 2159G | 1566G |    0   |     0   |    0   |     0   | exists,up |
| 21 |  s1  | 2128G | 1597G |    0   |     0   |    0   |     0   | exists,up |
| 22 |  s3  | 2064G | 1661G |    0   |     0   |    0   |     0   | exists,up |
| 23 |  s2  | 1943G | 1782G |    0   |     0   |    0   |     0   | exists,up |
| 24 |  s3  | 2168G | 1557G |    0   |     0   |    0   |     0   | exists,up |
| 25 |  s2  | 2113G | 1612G |    0   |     0   |    0   |     0   | exists,up |
| 26 |  s1  | 68.9G | 3657G |    0   |     0   |    0   |     0   | exists,up |
+----+------+-------+-------+--------+---------+--------+---------+-----------+

Why is this happening? I thought that maybe the 2 PG marked as toofull involved either the OSD.12 (which is emptying) or the 26 (the new one) but it seems that this is not the case:

root@s1:~# ceph pg dump|egrep 'toofull|PG_STAT'
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES       OMAP_BYTES* OMAP_KEYS* LOG  DISK_LOG STATE                                          STATE_STAMP                VERSION       REPORTED       UP         UP_PRIMARY ACTING     ACTING_PRIMARY LAST_SCRUB    SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN 
6.212     11110                  0        0     22220       0 38145321727           0          0 3023     3023 active+remapped+backfill_wait+backfill_toofull 2019-12-09 11:11:39.093042  13598'212053  13713:1179718  [6,19,24]          6  [13,0,24]             13  13549'211985 2019-12-08 19:46:10.461113    11644'211779 2019-12-06 07:37:42.864325             0 
6.bc      11057                  0        0     22114       0 37733931136           0          0 3032     3032 active+remapped+backfill_wait+backfill_toofull 2019-12-09 10:42:25.534277  13549'212110  13713:1229839 [15,25,17]         15 [19,18,17]             19  13549'211983 2019-12-08 11:02:45.846031    11644'211854 2019-12-06 06:22:43.565313             0 

Any hints? I'm not worried because I think that the cluster will heal himself, but this is not clear and logic.

-- 
Simone Lazzaris
Staff R&D 

Qcom S.p.A.
Via Roggia Vignola, 9 | 24047 Treviglio (BG)
T +39 0363 47905 | D +39 0363 1970352
simone.lazzaris@xxxxxxx | www.qcom.it

Qcom Official Pages LinkedIn | Facebook

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com