Re: Degraded data redundancy (low space): 1 pg backfill_toofull

Sebastian Igerl <igerlster@xxxxxxxxx> · Sat, 28 Jul 2018 12:21:02 +0200

i set up my test cluster many years ago with only 3 OSDs and never increased the PGs :-) I plan on doing so after its healthy again...  it's long overdue... maybe 512 :-)
and yes that's what i thought too.. it should have more than enough space to move data  .. hmm...

i wouldn't be surprised if it fixes itself after recovery.. but still would be nice to know whats going on.....

And the initial degraded still confuses me...

by the way.. i'm on mimic :-) latest version from today. 13.2.1

Sebastian

On Sat, Jul 28, 2018 at 12:03 PM Sinan Polat <sinan@xxxxxxxx> wrote:
Ceph has tried to (re)balance your data, backfill_toofull means no available space to move data, but you have plenty of space.
Why do you have so little pgs? I would increase the amount of pgs, but before doing so lets see what others will say.

Sinan

Op 28 jul. 2018 om 11:50 heeft Sebastian Igerl <igerlster@xxxxxxxxx> het volgende geschreven:

Hi,
i added 4 more OSDs on my 4 node Test Cluster and now i'm in HEALTH_ERR state. Right now its still recovering, but still, should this happen ? None of my OSDs are full. Maybe i need more PGs ? But since my %USE is < 40% it should be still ok to recover without HEALTH_ERR ?

  data:
    pools:   7 pools, 484 pgs
    objects: 2.70 M objects, 10 TiB
    usage:   31 TiB used, 114 TiB / 146 TiB avail
    pgs:     2422839/8095065 objects misplaced (29.930%)
             343 active+clean
             101 active+remapped+backfill_wait
             39  active+remapped+backfilling
             1   active+remapped+backfill_wait+backfill_toofull

  io:
    recovery: 315 MiB/s, 78 objects/s

ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS
 0   hdd 2.72890  1.00000 2.7 TiB 975 GiB 1.8 TiB 34.89 1.62  31
 1   hdd 2.72899  1.00000 2.7 TiB 643 GiB 2.1 TiB 23.00 1.07  36
 8   hdd 7.27739  1.00000 7.3 TiB 1.7 TiB 5.5 TiB 23.85 1.11  83
12   hdd 7.27730  1.00000 7.3 TiB 1.1 TiB 6.2 TiB 14.85 0.69  81
16   hdd 7.27730  1.00000 7.3 TiB 2.0 TiB 5.3 TiB 27.68 1.29  74
20   hdd 9.09569  1.00000 9.1 TiB 108 GiB 9.0 TiB  1.16 0.05  43
 2   hdd 2.72899  1.00000 2.7 TiB 878 GiB 1.9 TiB 31.40 1.46  36
 3   hdd 2.72899  1.00000 2.7 TiB 783 GiB 2.0 TiB 28.02 1.30  39
 9   hdd 7.27739  1.00000 7.3 TiB 2.0 TiB 5.3 TiB 27.58 1.28  85
13   hdd 7.27730  1.00000 7.3 TiB 2.2 TiB 5.1 TiB 30.10 1.40  78
17   hdd 7.27730  1.00000 7.3 TiB 2.1 TiB 5.2 TiB 28.23 1.31  84
21   hdd 9.09569  1.00000 9.1 TiB 192 GiB 8.9 TiB  2.06 0.10  41
 4   hdd 2.72899  1.00000 2.7 TiB 927 GiB 1.8 TiB 33.18 1.54  34
 5   hdd 2.72899  1.00000 2.7 TiB 1.0 TiB 1.7 TiB 37.57 1.75  28
10   hdd 7.27739  1.00000 7.3 TiB 2.2 TiB 5.0 TiB 30.66 1.43  87
14   hdd 7.27730  1.00000 7.3 TiB 1.8 TiB 5.5 TiB 24.23 1.13  89
18   hdd 7.27730  1.00000 7.3 TiB 2.5 TiB 4.8 TiB 33.83 1.57  93
22   hdd 9.09569  1.00000 9.1 TiB 210 GiB 8.9 TiB  2.26 0.10  44
 6   hdd 2.72899  1.00000 2.7 TiB 350 GiB 2.4 TiB 12.51 0.58  21
 7   hdd 2.72899  1.00000 2.7 TiB 980 GiB 1.8 TiB 35.07 1.63  35
11   hdd 7.27739  1.00000 7.3 TiB 2.8 TiB 4.4 TiB 39.14 1.82  99
15   hdd 7.27730  1.00000 7.3 TiB 1.6 TiB 5.6 TiB 22.49 1.05  82
19   hdd 7.27730  1.00000 7.3 TiB 2.1 TiB 5.2 TiB 28.49 1.32  77
23   hdd 9.09569  1.00000 9.1 TiB 285 GiB 8.8 TiB  3.06 0.14  52
                    TOTAL 146 TiB  31 TiB 114 TiB 21.51
MIN/MAX VAR: 0.05/1.82  STDDEV: 11.78

Right after adding the osds it showed degraded for a few minutes, since all my pools have a redundancy of 3 and i'm adding osd i'm a bit confused why this happens ? I get why it's misplaced, but undersized and degraded ?

pgs:     4611/8095032 objects degraded (0.057%)
             2626460/8095032 objects misplaced (32.445%)
             215 active+clean
             192 active+remapped+backfill_wait
             26  active+recovering+undersized+remapped
             17  active+recovery_wait+undersized+degraded+remapped
             16  active+recovering
             11  active+recovery_wait+degraded
             6   active+remapped+backfilling
             1   active+remapped+backfill_toofull

Maybe someone can give me some pointers on what i'm missing to understand whats happening here ?

Thanks!

Sebastian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com