Ceph Nautilus - can't balance due to degraded state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

We appear to be stuck in a proverbial chicken and egg situation. Degraded placement groups won’t backfill as OSDs are near full and we can’t run the balancer as some placement groups are degraded.

 

We upgraded Ceph from Luminous 12.2.12 to Nautilus 14.2.1 on a cluster used for backup services. We are in the process of migrating data (nearly complete), after which we’ll be able to repurpose the old systems as additional Ceph OSD nodes. Our cluster was subsequently at about 75% utilisation and the balancer module together with upmap did a great job. We’ve historically been very conservative with placement group numbering, considering that smaller drives generally get replaced with much larger ones and the PGs per OSD subsequently grow to problematic levels.

 

The upgrade process was so extremely painless that we also enabled the pg_autoscaler module which subsequently marked 75% of the data as miss placed, but also degraded various placement groups. The result is now that we have many placement groups marked as nearfull, but can’t run the balancer as some placement groups are in a degraded state.

 

 

Is there a way we can override the degraded check and force the balancer to redistribute PGs; or could we manually adjust OSDs to have the same effect?

 

Is there alternatively a way that we can get Ceph to first heal the degraded PGs and only then work on the miss placed ones?

 

 

There are only 3 RBD images in this cluster, a 80GB operating system image in a replicated SSD pool, a 150TB erasure coded image and a relatively tiny replicated SSD caching tier for the EC pool.

 

[admin@kvm7e ~]# ceph osd lspools

1 rbd_ssd

5 cephfs_data

6 cephfs_metadata

7 rbd_hdd

8 ec_hdd

9 rbd_hdd_cache

10 ec_hdd_cache

 

[admin@kvm7e ~]# for f in `ceph osd lspools | cut -d\  -f2`; do ceph osd pool set $f pg_autoscale_mode on; done;

set pool 1 pg_autoscale_mode to on

set pool 5 pg_autoscale_mode to on

set pool 6 pg_autoscale_mode to on

set pool 7 pg_autoscale_mode to on

set pool 8 pg_autoscale_mode to on

set pool 9 pg_autoscale_mode to on

set pool 10 pg_autoscale_mode to on

 

 

 

Concerning was that Ceph marked OSDs are near full although this is by default only when an OSD reaches 85% utilisation. I presume Ceph projects the resulting storage utilisation based on the weighting set by the balancer?

 

[admin@kvm7e ~]# ceph health detail

HEALTH_ERR noout flag(s) set; 6 nearfull osd(s); 4 pool(s) nearfull; Reduced data availability: 2 pgs inactive; Degraded data redundancy (low space): 4 pgs backfill_toofull

OSDMAP_FLAGS noout flag(s) set

OSD_NEARFULL 6 nearfull osd(s)

    osd.100 is near full

    osd.101 is near full

    osd.102 is near full

    osd.103 is near full

    osd.104 is near full

    osd.105 is near full

POOL_NEARFULL 4 pool(s) nearfull

    pool 'cephfs_data' is nearfull

    pool 'cephfs_metadata' is nearfull

    pool 'rbd_hdd' is nearfull

    pool 'ec_hdd' is nearfull

PG_AVAILABILITY Reduced data availability: 2 pgs inactive

    pg 7.1e is stuck inactive for 437.102346, current state clean+premerge+peered, last acting [303,104,405]

    pg 7.3e is stuck inactive for 436.965670, current state remapped+premerge+backfill_wait+peered, last acting [405,104,301]

PG_DEGRADED_FULL Degraded data redundancy (low space): 4 pgs backfill_toofull

    pg 8.c8 is active+remapped+backfill_wait+backfill_toofull, acting [305,104,404,504,203]

    pg 8.1bb is active+remapped+backfill_wait+backfill_toofull, acting [505,204,102,304,404]

    pg 8.326 is active+remapped+backfill_wait+backfill_toofull, acting [302,504,402,103,202]

    pg 8.3e0 is active+remapped+backfill_wait+backfill_toofull, acting [202,402,103,305,505]

 

[admin@kvm7e ~]# ceph osd df

ID    CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS

  100   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB 9.2 MiB  2.5 GiB 490 GiB 73.71 1.01  59     up

  101   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB 4.2 MiB  2.5 GiB 489 GiB 73.77 1.01  58     up

  102   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  21 MiB  7.4 GiB 1.4 TiB 73.63 1.01 175     up

  103   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  20 MiB  7.4 GiB 1.4 TiB 73.49 1.00 176     up

  104   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  28 MiB   13 GiB 2.2 TiB 75.68 1.03 304     up

  105   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  23 MiB   13 GiB 2.2 TiB 75.63 1.03 301     up

  200   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB 4.1 MiB  2.5 GiB 492 GiB 73.61 1.01  59     up

  201   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB 4.2 MiB  2.5 GiB 491 GiB 73.65 1.01  58     up

  202   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  21 MiB  7.4 GiB 1.4 TiB 73.60 1.01 175     up

  203   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  18 MiB  7.4 GiB 1.4 TiB 73.51 1.00 175     up

  204   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  40 MiB   13 GiB 2.2 TiB 75.65 1.03 301     up

  205   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  41 MiB   13 GiB 2.2 TiB 75.71 1.03 302     up

  300   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB 6.1 MiB  2.5 GiB 490 GiB 73.68 1.01  58     up

  301   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB  13 MiB  2.5 GiB 490 GiB 73.72 1.01  59     up

  302   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  13 MiB  7.4 GiB 1.4 TiB 73.57 1.01 174     up

  303   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  22 MiB  7.4 GiB 1.4 TiB 73.56 1.01 177     up

  304   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  42 MiB   13 GiB 2.2 TiB 75.68 1.03 302     up

  305   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  40 MiB   13 GiB 2.2 TiB 75.67 1.03 304     up

  400   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB 8.6 MiB  2.5 GiB 489 GiB 73.75 1.01  59     up

  401   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB 8.7 MiB  2.5 GiB 493 GiB 73.56 1.01  59     up

  402   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  25 MiB  7.4 GiB 1.4 TiB 73.51 1.00 176     up

  403   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  18 MiB  7.4 GiB 1.4 TiB 73.59 1.01 175     up

  404   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  40 MiB   13 GiB 2.2 TiB 75.68 1.03 298     up

  405   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  40 MiB   13 GiB 2.2 TiB 75.69 1.03 301     up

  500   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB 8.8 MiB  2.5 GiB 491 GiB 73.66 1.01  59     up

  501   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB  10 MiB  2.5 GiB 491 GiB 73.63 1.01  58     up

  502   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  22 MiB  7.4 GiB 1.4 TiB 73.62 1.01 174     up

  503   hdd 5.45789  1.00000 5.5 TiB 4.0 TiB 4.0 TiB  21 MiB  7.4 GiB 1.4 TiB 73.54 1.01 177     up

  504   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  38 MiB   13 GiB 2.2 TiB 75.67 1.03 301     up

  505   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  42 MiB   13 GiB 2.2 TiB 75.67 1.03 300     up

10100   ssd 0.40630  1.00000 416 GiB  40 GiB  39 GiB  28 MiB  996 MiB 376 GiB  9.60 0.13  35     up

10101   ssd 0.40630  1.00000 416 GiB  44 GiB  43 GiB  20 MiB 1004 MiB 372 GiB 10.57 0.14  37     up

10200   ssd 0.40630  1.00000 416 GiB  32 GiB  31 GiB  15 MiB 1009 MiB 384 GiB  7.80 0.11  36     up

10201   ssd 0.40630  1.00000 416 GiB  32 GiB  31 GiB  16 MiB 1008 MiB 384 GiB  7.67 0.10  37     up

10300   ssd 0.40630  1.00000 416 GiB  40 GiB  39 GiB  17 MiB 1007 MiB 376 GiB  9.71 0.13  36     up

10301   ssd 0.40630  1.00000 416 GiB  43 GiB  42 GiB  18 MiB 1006 MiB 373 GiB 10.31 0.14  37     up

10400   ssd 0.40630  1.00000 416 GiB  34 GiB  33 GiB  17 MiB 1007 MiB 382 GiB  8.14 0.11  35     up

10401   ssd 0.40630  1.00000 416 GiB  40 GiB  39 GiB  19 MiB 1005 MiB 376 GiB  9.64 0.13  36     up

10500   ssd 0.40630  1.00000 416 GiB  38 GiB  37 GiB  18 MiB 1006 MiB 378 GiB  9.11 0.12  37     up

10501   ssd 0.40630  1.00000 416 GiB  46 GiB  45 GiB  18 MiB 1006 MiB 370 GiB 10.98 0.15  37     up

                       TOTAL 168 TiB 123 TiB 123 TiB 839 MiB  234 GiB  45 TiB 73.16

 

 

 

[admin@kvm7e ~]# for f in /var/run/ceph/ceph-osd.*.asok; do ceph --admin-daemon $f config show; done | grep 'full'

    "mon_cache_target_full_warn_ratio": "0.660000",

    "mon_osd_backfillfull_ratio": "0.900000",

    "mon_osd_full_ratio": "0.950000",

    "mon_osd_nearfull_ratio": "0.850000",

    "mon_osdmap_full_prune_enabled": "true",

    "mon_osdmap_full_prune_interval": "10",

    "mon_osdmap_full_prune_min": "10000",

    "mon_osdmap_full_prune_txsize": "100",

    "osd_debug_skip_full_check_in_backfill_reservation": "false",

    "osd_debug_skip_full_check_in_recovery": "false",

    "osd_failsafe_full_ratio": "0.970000",

    "osd_pool_default_cache_target_full_ratio": "0.800000",

    "paxos_stash_full_interval": "25",

<snip>

 

 

 

 

An hour later:

[admin@kvm7e ~]# ceph health detail

HEALTH_ERR noout flag(s) set; 6 nearfull osd(s); 4 pool(s) nearfull; Degraded data redundancy: 2891706/93291241 objects degraded (3.100%), 165 pgs degraded, 165 pgs undersized; Degraded data redundancy (low space): 647 pgs backfill_toofull

OSDMAP_FLAGS noout flag(s) set

OSD_NEARFULL 6 nearfull osd(s)

    osd.100 is near full

    osd.101 is near full

    osd.102 is near full

    osd.103 is near full

    osd.104 is near full

    osd.105 is near full

POOL_NEARFULL 4 pool(s) nearfull

    pool 'cephfs_data' is nearfull

    pool 'cephfs_metadata' is nearfull

    pool 'rbd_hdd' is nearfull

    pool 'ec_hdd' is nearfull

PG_DEGRADED Degraded data redundancy: 2891706/93291241 objects degraded (3.100%), 165 pgs degraded, 165 pgs undersized

    pg 8.304 is active+undersized+degraded+remapped+backfill_toofull, acting [2147483647,503,304,402,203]

    pg 8.307 is stuck undersized for 7968.693589, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,205,503,2147483647,401]

    pg 8.30c is stuck undersized for 7968.739427, current state active+undersized+degraded+remapped+backfill_toofull, last acting [501,2147483647,202,304,402]

    pg 8.311 is stuck undersized for 7968.732720, current state active+undersized+degraded+remapped+backfill_toofull, last acting [405,2147483647,502,204,305]

    pg 8.314 is stuck undersized for 7968.732950, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,302,503,205,405]

    pg 8.319 is stuck undersized for 7968.716745, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,403,204,505,305]

    pg 8.31c is stuck undersized for 7968.713635, current state active+undersized+degraded+remapped+backfill_toofull, last acting [500,2147483647,402,305,204]

    pg 8.327 is stuck undersized for 7968.664546, current state active+undersized+degraded+remapped+backfill_toofull, last acting [202,305,504,2147483647,405]

    pg 8.32f is stuck undersized for 7968.682409, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,402,204,504,302]

    pg 8.332 is stuck undersized for 7968.732504, current state active+undersized+degraded+remapped+backfill_toofull, last acting [302,405,2147483647,205,502]

    pg 8.334 is stuck undersized for 7968.694182, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,205,404,302,503]

    pg 8.33c is stuck undersized for 7968.734577, current state active+undersized+degraded+remapped+backfill_toofull, last acting [302,505,405,2147483647,201]

    pg 8.33f is stuck undersized for 7968.552298, current state active+undersized+degraded+remapped+backfill_toofull, last acting [400,504,2147483647,304,204]

    pg 8.348 is stuck undersized for 7968.696137, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,2147483647,404,504,203]

    pg 8.34c is stuck undersized for 7968.768111, current state active+undersized+degraded+remapped+backfilling, last acting [504,204,2147483647,303,403]

    pg 8.350 is stuck undersized for 7968.734046, current state active+undersized+degraded+remapped+backfill_toofull, last acting [405,502,305,2147483647,205]

    pg 8.35a is stuck undersized for 7968.685123, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,201,404,504,304]

    pg 8.35c is stuck undersized for 7968.700947, current state active+undersized+degraded+remapped+backfill_toofull, last acting [404,2147483647,303,505,205]

    pg 8.35e is stuck undersized for 7968.683728, current state active+undersized+degraded+remapped+backfill_wait, last acting [402,2147483647,304,500,203]

    pg 8.361 is stuck undersized for 7968.798644, current state active+undersized+degraded+remapped+backfilling, last acting [505,404,304,2147483647,204]

    pg 8.365 is stuck undersized for 7968.731458, current state active+undersized+degraded+remapped+backfill_toofull, last acting [405,303,2147483647,205,504]

    pg 8.368 is stuck undersized for 7968.799312, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,304,404,502,205]

    pg 8.36b is stuck undersized for 7968.736514, current state active+undersized+degraded+remapped+backfill_wait, last acting [300,403,204,2147483647,505]

    pg 8.36f is stuck undersized for 7968.695546, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,503,2147483647,405,205]

    pg 8.373 is stuck undersized for 7968.717140, current state active+undersized+degraded+remapped+backfill_toofull, last acting [403,303,2147483647,202,502]

    pg 8.379 is stuck undersized for 7968.732125, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,405,205,302,504]

    pg 8.37c is stuck undersized for 7968.712063, current state active+undersized+degraded+remapped+backfill_toofull, last acting [401,500,205,2147483647,302]

    pg 8.37d is stuck undersized for 7968.740233, current state active+undersized+degraded+remapped+backfill_toofull, last acting [501,302,2147483647,205,404]

    pg 8.384 is stuck undersized for 7968.796821, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,503,304,402,203]

    pg 8.387 is stuck undersized for 7968.639604, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,205,503,2147483647,401]

    pg 8.392 is stuck undersized for 7968.771812, current state active+undersized+degraded+remapped+backfill_toofull, last acting [504,305,405,202,2147483647]

    pg 8.394 is stuck undersized for 7968.734314, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,302,503,205,405]

    pg 8.399 is stuck undersized for 7968.722090, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,403,204,505,305]

    pg 8.39c is stuck undersized for 7968.712907, current state active+undersized+degraded+remapped+backfill_toofull, last acting [500,2147483647,402,305,204]

    pg 8.39d is stuck undersized for 7968.725227, current state active+undersized+degraded+remapped+backfill_toofull, last acting [403,2147483647,502,305,203]

    pg 8.3a5 is stuck undersized for 7968.644387, current state active+undersized+degraded+remapped+backfill_toofull, last acting [204,2147483647,304,400,505]

    pg 8.3a7 is stuck undersized for 7968.668449, current state active+undersized+degraded+remapped+backfill_toofull, last acting [202,305,504,2147483647,405]

    pg 8.3af is stuck undersized for 7968.683191, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,402,204,504,302]

    pg 8.3b2 is stuck undersized for 7968.733004, current state active+undersized+degraded+remapped+backfill_toofull, last acting [302,405,2147483647,205,502]

    pg 8.3b4 is stuck undersized for 7968.694415, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,205,404,302,503]

    pg 8.3bc is stuck undersized for 7968.732598, current state active+undersized+degraded+remapped+backfill_toofull, last acting [302,505,405,2147483647,201]

    pg 8.3cc is stuck undersized for 7968.771592, current state active+undersized+degraded+remapped+backfill_toofull, last acting [504,204,2147483647,303,403]

    pg 8.3d0 is stuck undersized for 7968.733520, current state active+undersized+degraded+remapped+backfill_toofull, last acting [405,502,305,2147483647,205]

    pg 8.3de is stuck undersized for 7968.681841, current state active+undersized+degraded+remapped+backfill_toofull, last acting [402,2147483647,304,500,203]

    pg 8.3df is stuck undersized for 7968.726621, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,200,303,500,404]

    pg 8.3e1 is stuck undersized for 7968.796332, current state active+undersized+degraded+remapped+backfill_toofull, last acting [505,404,304,2147483647,204]

    pg 8.3ec is stuck undersized for 7968.776206, current state active+undersized+degraded+remapped+backfill_toofull, last acting [502,302,405,2147483647,204]

    pg 8.3ef is stuck undersized for 7968.690806, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,503,2147483647,405,205]

    pg 8.3f7 is stuck undersized for 7968.672395, current state active+undersized+degraded+remapped+backfill_toofull, last acting [205,505,404,305,2147483647]

    pg 8.3fc is stuck undersized for 7968.711543, current state active+undersized+degraded+remapped+backfill_toofull, last acting [401,500,205,2147483647,302]

    pg 8.3fd is stuck undersized for 7968.740233, current state active+undersized+degraded+remapped+backfill_toofull, last acting [501,302,2147483647,205,404]

PG_DEGRADED_FULL Degraded data redundancy (low space): 647 pgs backfill_toofull

    pg 8.3bc is active+undersized+degraded+remapped+backfill_toofull, acting [302,505,405,2147483647,201]

    pg 8.3bd is active+remapped+backfill_wait+backfill_toofull, acting [202,402,105,503,304]

    pg 8.3be is active+remapped+backfill_toofull, acting [504,303,205,100,403]

    pg 8.3c0 is active+remapped+backfill_toofull, acting [402,304,503,105,205]

    pg 8.3c1 is active+remapped+backfill_toofull, acting [301,101,405,504,204]

    pg 8.3c2 is active+remapped+backfill_toofull, acting [104,204,305,503,405]

    pg 8.3c3 is active+remapped+backfill_toofull, acting [405,105,204,303,505]

    pg 8.3c4 is active+remapped+backfill_toofull, acting [504,305,101,403,201]

    pg 8.3c6 is active+remapped+backfill_toofull, acting [404,304,205,504,105]

    pg 8.3c8 is active+remapped+backfill_toofull, acting [305,104,404,504,203]

    pg 8.3c9 is active+remapped+backfill_toofull, acting [404,505,102,301,203]

    pg 8.3cb is active+remapped+backfill_wait+backfill_toofull, acting [105,200,402,505,304]

    pg 8.3cc is active+undersized+degraded+remapped+backfill_toofull, acting [504,204,2147483647,303,403]

    pg 8.3cd is active+remapped+backfill_wait+backfill_toofull, acting [105,305,403,504,205]

    pg 8.3cf is active+remapped+backfill_toofull, acting [502,400,304,105,202]

    pg 8.3d0 is active+undersized+degraded+remapped+backfill_toofull, acting [405,502,305,2147483647,205]

    pg 8.3d1 is active+remapped+backfill_toofull, acting [205,103,404,502,304]

    pg 8.3d2 is active+remapped+backfill_toofull, acting [202,505,304,403,103]

    pg 8.3d3 is active+remapped+backfill_toofull, acting [204,101,405,505,302]

    pg 8.3d4 is active+remapped+backfill_toofull, acting [503,305,404,100,205]

    pg 8.3d5 is active+remapped+backfill_wait+backfill_toofull, acting [105,503,203,401,304]

    pg 8.3d7 is active+remapped+backfill_toofull, acting [504,305,404,200,102]

    pg 8.3d9 is active+remapped+backfill_toofull, acting [202,302,402,105,504]

    pg 8.3da is active+remapped+backfill_toofull, acting [104,201,404,504,304]

    pg 8.3dc is active+remapped+backfill_wait+backfill_toofull, acting [404,104,303,505,205]

    pg 8.3dd is active+remapped+backfill_toofull, acting [404,204,505,302,105]

    pg 8.3de is active+undersized+degraded+remapped+backfill_toofull, acting [402,2147483647,304,500,203]

    pg 8.3df is active+undersized+degraded+remapped+backfill_toofull, acting [2147483647,200,303,500,404]

    pg 8.3e0 is active+remapped+backfill_toofull, acting [202,402,103,305,505]

    pg 8.3e1 is active+undersized+degraded+remapped+backfill_toofull, acting [505,404,304,2147483647,204]

    pg 8.3e2 is active+remapped+backfill_toofull, acting [304,102,505,401,203]

    pg 8.3e4 is active+remapped+backfill_toofull, acting [303,102,503,202,405]

    pg 8.3e5 is active+remapped+backfill_toofull, acting [405,303,104,205,504]

    pg 8.3e6 is active+remapped+backfill_wait+backfill_toofull, acting [504,105,204,404,305]

    pg 8.3e7 is active+remapped+backfill_toofull, acting [103,202,505,405,304]

    pg 8.3eb is active+remapped+backfill_toofull, acting [300,403,204,104,505]

    pg 8.3ec is active+undersized+degraded+remapped+backfill_toofull, acting [502,302,405,2147483647,204]

    pg 8.3ee is active+remapped+backfill_toofull, acting [403,300,204,503,100]

    pg 8.3ef is active+undersized+degraded+remapped+backfill_toofull, acting [305,503,2147483647,405,205]

    pg 8.3f0 is active+remapped+backfill_toofull, acting [102,203,500,304,403]

    pg 8.3f1 is active+remapped+backfill_toofull, acting [505,305,404,105,202]

    pg 8.3f2 is active+remapped+backfill_wait+backfill_toofull, acting [105,304,403,202,502]

    pg 8.3f4 is active+remapped+backfill_toofull, acting [205,505,102,405,303]

    pg 8.3f5 is active+remapped+backfill_toofull, acting [405,105,304,504,201]

    pg 8.3f6 is active+remapped+backfill_wait+backfill_toofull, acting [105,204,505,304,404]

    pg 8.3f7 is active+undersized+degraded+remapped+backfill_toofull, acting [205,505,404,305,2147483647]

    pg 8.3fb is active+remapped+backfill_toofull, acting [303,505,401,105,203]

    pg 8.3fc is active+undersized+degraded+remapped+backfill_toofull, acting [401,500,205,2147483647,302]

    pg 8.3fd is active+undersized+degraded+remapped+backfill_toofull, acting [501,302,2147483647,205,404]

    pg 8.3fe is active+remapped+backfill_toofull, acting [504,304,402,205,105]

    pg 8.3ff is active+remapped+backfill_toofull, acting [405,202,102,501,303]

 

 

 

About a day after enabling the pg_autoscaler module:

[admin@kvm7e ~]# ceph health detail

HEALTH_ERR noout flag(s) set; 18 nearfull osd(s); 4 pool(s) nearfull; Degraded data redundancy: 4227162/93352306 objects degraded (4.528%), 250 pgs degraded, 253 pgs undersized; Degraded data redundancy (low space): 559 pgs backfill_toofull

OSDMAP_FLAGS noout flag(s) set

OSD_NEARFULL 18 nearfull osd(s)

    osd.100 is near full

    osd.101 is near full

    osd.102 is near full

    osd.103 is near full

    osd.104 is near full

    osd.105 is near full

    osd.200 is near full

    osd.201 is near full

    osd.203 is near full

    osd.300 is near full

    osd.301 is near full

    osd.303 is near full

    osd.304 is near full

    osd.401 is near full

    osd.404 is near full

    osd.501 is near full

    osd.502 is near full

    osd.504 is near full

POOL_NEARFULL 4 pool(s) nearfull

    pool 'cephfs_data' is nearfull

    pool 'cephfs_metadata' is nearfull

    pool 'rbd_hdd' is nearfull

    pool 'ec_hdd' is nearfull

PG_DEGRADED Degraded data redundancy: 4227162/93352306 objects degraded (4.528%), 250 pgs degraded, 253 pgs undersized

    pg 8.35a is active+undersized+degraded+remapped+backfill_toofull, acting [2147483647,201,404,504,304]

    pg 8.35b is stuck undersized for 13233.292244, current state active+undersized+degraded+remapped+backfill_toofull, last acting [203,503,2147483647,305,105]

    pg 8.361 is stuck undersized for 67655.237839, current state active+undersized+remapped+backfill_toofull, last acting [505,404,304,2147483647,204]

    pg 8.362 is stuck undersized for 67682.460453, current state active+undersized+degraded+remapped+backfill_toofull, last acting [304,2147483647,505,401,203]

    pg 8.363 is stuck undersized for 67665.785277, current state active+undersized+degraded+remapped+backfill_toofull, last acting [404,203,301,504,2147483647]

    pg 8.365 is stuck undersized for 13232.214476, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,303,104,205,504]

    pg 8.366 is stuck undersized for 67665.803857, current state active+undersized+degraded+remapped+backfill_toofull, last acting [504,2147483647,204,404,305]

    pg 8.368 is stuck undersized for 67655.243869, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,304,404,502,205]

    pg 8.36d is stuck undersized for 67682.440479, current state active+undersized+degraded+remapped+backfill_toofull, last acting [404,2147483647,201,502,304]

    pg 8.36f is stuck undersized for 13232.182277, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,503,104,2147483647,205]

    pg 8.373 is stuck undersized for 67665.636152, current state active+undersized+degraded+remapped+backfill_toofull, last acting [403,303,2147483647,202,502]

    pg 8.374 is stuck undersized for 13232.196886, current state active+undersized+degraded+remapped+backfill_toofull, last acting [205,505,102,2147483647,303]

    pg 8.375 is stuck undersized for 13232.147722, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,105,304,504,201]

    pg 8.379 is stuck undersized for 13232.052714, current state active+undersized+degraded+remapped+backfill_toofull, last acting [104,2147483647,205,302,504]

    pg 8.37a is stuck undersized for 13233.378060, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,105,502,202,2147483647]

    pg 8.37d is stuck undersized for 67665.810745, current state active+undersized+degraded+remapped+backfill_toofull, last acting [501,302,2147483647,205,404]

    pg 8.381 is stuck undersized for 13233.351431, current state active+undersized+degraded+remapped+backfill_toofull, last acting [304,204,2147483647,101,503]

    pg 8.382 is stuck undersized for 13232.226588, current state active+undersized+degraded+remapped+backfill_toofull, last acting [504,205,2147483647,300,105]

    pg 8.387 is stuck undersized for 67665.791360, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,205,503,2147483647,401]

    pg 8.391 is stuck undersized for 13233.342426, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,104,502,204,305]

    pg 8.392 is stuck undersized for 13232.227756, current state active+undersized+degraded+remapped+backfill_toofull, last acting [504,305,2147483647,202,104]

    pg 8.396 is stuck undersized for 13233.333363, current state active+undersized+degraded+remapped+backfill_toofull, last acting [303,102,2147483647,504,204]

    pg 8.399 is stuck undersized for 67665.635750, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,403,204,505,305]

    pg 8.39c is stuck undersized for 67655.266511, current state active+undersized+degraded+remapped+backfill_toofull, last acting [500,2147483647,402,305,204]

    pg 8.39d is stuck undersized for 67655.215300, current state active+undersized+degraded+remapped+backfill_toofull, last acting [403,2147483647,502,305,203]

    pg 8.3a0 is stuck undersized for 13233.339729, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,104,304,501,204]

    pg 8.3a1 is stuck undersized for 13232.162171, current state active+undersized+degraded+remapped+backfill_toofull, last acting [105,505,205,303,2147483647]

    pg 8.3a4 is stuck undersized for 13233.379809, current state active+undersized+degraded+remapped+backfilling, last acting [2147483647,305,102,502,203]

    pg 8.3ad is stuck undersized for 13233.368722, current state active+undersized+degraded+remapped+backfill_toofull, last acting [301,202,501,2147483647,103]

    pg 8.3ae is stuck undersized for 67665.744856, current state active+undersized+degraded+remapped+backfill_toofull, last acting [304,503,404,200,2147483647]

    pg 8.3af is stuck undersized for 67665.769691, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,402,204,504,302]

    pg 8.3b4 is stuck undersized for 67682.382958, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,205,404,302,503]

    pg 8.3bc is stuck undersized for 13232.171438, current state active+undersized+degraded+remapped+backfilling, last acting [302,505,2147483647,104,201]

    pg 8.3c3 is stuck undersized for 13232.151774, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,105,204,303,505]

    pg 8.3c6 is stuck undersized for 67665.792220, current state active+undersized+degraded+remapped+backfill_toofull, last acting [404,304,205,504,2147483647]

    pg 8.3ca is stuck undersized for 67665.817464, current state active+undersized+degraded+remapped+backfill_toofull, last acting [503,2147483647,205,305,400]

    pg 8.3cc is stuck undersized for 67665.802241, current state active+undersized+degraded+remapped+backfill_toofull, last acting [504,204,2147483647,303,403]

    pg 8.3d3 is stuck undersized for 13233.266745, current state active+undersized+degraded+remapped+backfill_toofull, last acting [204,101,2147483647,505,302]

    pg 8.3de is stuck undersized for 67655.238851, current state active+undersized+degraded+remapped+backfill_toofull, last acting [402,2147483647,304,500,203]

    pg 8.3df is stuck undersized for 67682.379100, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,200,303,500,404]

    pg 8.3e1 is stuck undersized for 67655.229239, current state active+undersized+degraded+remapped+backfill_toofull, last acting [505,404,304,2147483647,204]

    pg 8.3e3 is stuck undersized for 67665.788025, current state active+undersized+degraded+remapped+backfill_toofull, last acting [404,203,301,504,2147483647]

    pg 8.3e5 is stuck undersized for 13233.333897, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,303,104,205,504]

    pg 8.3e7 is stuck undersized for 13233.318293, current state active+undersized+degraded+remapped+backfill_toofull, last acting [103,202,505,2147483647,304]

    pg 8.3ea is stuck undersized for 67682.450664, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,2147483647,203,403,504]

    pg 8.3ec is stuck undersized for 13232.182236, current state active+undersized+degraded+remapped+backfill_toofull, last acting [502,302,2147483647,104,204]

    pg 8.3f1 is stuck undersized for 67665.809335, current state active+undersized+degraded+remapped+backfill_toofull, last acting [505,305,404,2147483647,202]

    pg 8.3fa is stuck undersized for 13232.182880, current state active+undersized+degraded+remapped+backfill_toofull, last acting [305,105,502,202,2147483647]

    pg 8.3fc is stuck undersized for 67655.245817, current state active+undersized+degraded+remapped+backfill_toofull, last acting [401,500,205,2147483647,302]

    pg 8.3fe is stuck undersized for 67665.800943, current state active+undersized+degraded+remapped+backfill_toofull, last acting [504,304,402,205,2147483647]

    pg 8.3ff is stuck undersized for 13233.326383, current state active+undersized+degraded+remapped+backfill_toofull, last acting [2147483647,202,102,501,303]

PG_DEGRADED_FULL Degraded data redundancy (low space): 559 pgs backfill_toofull

    pg 8.3af is active+undersized+degraded+remapped+backfill_toofull, acting [2147483647,402,204,504,302]

    pg 8.3b0 is active+remapped+backfill_toofull, acting [405,305,502,200,103]

    pg 8.3b1 is active+remapped+backfill_wait+backfill_toofull, acting [505,205,300,101,402]

    pg 8.3b4 is active+undersized+degraded+remapped+backfill_toofull, acting [2147483647,205,404,302,503]

    pg 8.3b5 is active+remapped+backfill_toofull, acting [204,505,100,403,304]

    pg 8.3b6 is active+remapped+backfill_toofull, acting [404,300,504,100,203]

    pg 8.3b7 is active+remapped+backfill_toofull, acting [204,105,504,303,401]

    pg 8.3b8 is active+remapped+backfill_toofull, acting [501,103,205,302,404]

    pg 8.3b9 is active+remapped+backfill_toofull, acting [502,301,402,103,204]

    pg 8.3ba is active+remapped+backfill_toofull, acting [303,102,203,403,505]

    pg 8.3bb is active+remapped+backfill_toofull, acting [505,204,102,304,404]

    pg 8.3bd is active+remapped+backfill_toofull, acting [202,402,105,503,304]

    pg 8.3be is active+remapped+backfill_toofull, acting [504,303,205,100,403]

    pg 8.3c3 is active+undersized+degraded+remapped+backfill_toofull, acting [2147483647,105,204,303,505]

    pg 8.3c4 is active+remapped+backfill_wait+backfill_toofull, acting [504,305,101,403,201]

    pg 8.3c6 is active+undersized+degraded+remapped+backfill_toofull, acting [404,304,205,504,2147483647]

    pg 8.3ca is active+undersized+degraded+remapped+backfill_toofull, acting [503,2147483647,205,305,400]

    pg 8.3cb is active+remapped+backfill_toofull, acting [105,200,402,505,304]

    pg 8.3cc is active+undersized+degraded+remapped+backfill_toofull, acting [504,204,2147483647,303,403]

    pg 8.3cd is active+remapped+backfill_toofull, acting [105,305,403,504,205]

    pg 8.3cf is active+remapped+backfill_toofull, acting [502,400,304,105,202]

    pg 8.3d0 is active+remapped+backfill_toofull, acting [405,502,305,104,205]

    pg 8.3d1 is active+remapped+backfill_toofull, acting [205,103,404,502,304]

    pg 8.3d3 is active+undersized+degraded+remapped+backfill_toofull, acting [204,101,2147483647,505,302]

    pg 8.3d4 is active+remapped+backfill_toofull, acting [503,305,404,100,205]

    pg 8.3d7 is active+remapped+backfill_toofull, acting [504,305,404,200,102]

    pg 8.3d8 is active+remapped+backfill_toofull, acting [403,202,102,303,500]

    pg 8.3da is active+remapped+backfill_wait+backfill_toofull, acting [104,201,404,504,304]

    pg 8.3dc is active+remapped+backfill_wait+backfill_toofull, acting [404,104,303,505,205]

    pg 8.3de is active+undersized+degraded+remapped+backfill_toofull, acting [402,2147483647,304,500,203]

    pg 8.3df is active+undersized+degraded+remapped+backfill_toofull, acting [2147483647,200,303,500,404]

    pg 8.3e0 is active+remapped+backfill_wait+backfill_toofull, acting [202,402,103,305,505]

    pg 8.3e1 is active+undersized+degraded+remapped+backfill_toofull, acting [505,404,304,2147483647,204]

    pg 8.3e2 is active+remapped+backfill_toofull, acting [304,102,505,401,203]

    pg 8.3e3 is active+undersized+degraded+remapped+backfill_toofull, acting [404,203,301,504,2147483647]

    pg 8.3e5 is active+undersized+degraded+remapped+backfill_toofull, acting [2147483647,303,104,205,504]

    pg 8.3e6 is active+remapped+backfill_toofull, acting [504,105,204,404,305]

    pg 8.3e7 is active+undersized+degraded+remapped+backfill_toofull, acting [103,202,505,2147483647,304]

    pg 8.3e9 is active+remapped+backfill_toofull, acting [504,103,305,403,203]

    pg 8.3ea is active+undersized+degraded+remapped+backfill_toofull, acting [305,2147483647,203,403,504]

    pg 8.3ec is active+undersized+degraded+remapped+backfill_toofull, acting [502,302,2147483647,104,204]

    pg 8.3ed is active+remapped+backfill_wait+backfill_toofull, acting [404,101,201,502,304]

    pg 8.3ee is active+remapped+backfill_toofull, acting [403,300,204,503,100]

    pg 8.3f0 is active+remapped+backfill_toofull, acting [102,203,500,304,403]

    pg 8.3f1 is active+undersized+degraded+remapped+backfill_toofull, acting [505,305,404,2147483647,202]

    pg 8.3f2 is active+remapped+backfill_toofull, acting [105,304,403,202,502]

    pg 8.3f4 is active+remapped+backfill_toofull, acting [205,505,102,405,303]

    pg 8.3fa is active+undersized+degraded+remapped+backfill_toofull, acting [305,105,502,202,2147483647]

    pg 8.3fc is active+undersized+degraded+remapped+backfill_toofull, acting [401,500,205,2147483647,302]

    pg 8.3fe is active+undersized+degraded+remapped+backfill_toofull, acting [504,304,402,205,2147483647]

    pg 8.3ff is active+undersized+degraded+remapped+backfill_toofull, acting [2147483647,202,102,501,303]

 

 

Ceph OSD utilisation breakdown:

[admin@kvm7e ~]# ceph osd df

ID    CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS

  100   hdd 1.81929  1.00000 1.8 TiB 1.6 TiB 1.6 TiB 2.2 MiB  3.5 GiB 232 GiB 87.55 1.13  68     up

  101   hdd 1.81929  1.00000 1.8 TiB 1.6 TiB 1.6 TiB 2.8 MiB  3.4 GiB 255 GiB 86.29 1.12  64     up

  102   hdd 5.45789  1.00000 5.5 TiB 4.8 TiB 4.8 TiB 6.6 MiB  9.4 GiB 696 GiB 87.54 1.13 187     up

  103   hdd 5.45789  1.00000 5.5 TiB 4.8 TiB 4.8 TiB 6.4 MiB  9.4 GiB 654 GiB 88.29 1.14 191     up

  104   hdd 9.09560  1.00000 9.1 TiB 5.9 TiB 5.9 TiB  11 MiB   12 GiB 3.2 TiB 64.67 0.84 192     up

  105   hdd 9.09560  1.00000 9.1 TiB 7.1 TiB 7.1 TiB  10 MiB   14 GiB 1.9 TiB 78.58 1.02 256     up

  200   hdd 1.81929  1.00000 1.8 TiB 1.5 TiB 1.5 TiB 2.0 MiB  3.2 GiB 317 GiB 82.99 1.08  64     up

  201   hdd 1.81929  1.00000 1.8 TiB 1.5 TiB 1.5 TiB 2.2 MiB  3.4 GiB 292 GiB 84.34 1.09  63     up

  202   hdd 5.45789  1.00000 5.5 TiB 4.3 TiB 4.3 TiB 6.7 MiB  8.5 GiB 1.1 TiB 79.18 1.03 176     up

  203   hdd 5.45789  1.00000 5.5 TiB 4.7 TiB 4.7 TiB 6.2 MiB  9.2 GiB 762 GiB 86.36 1.12 195     up

  204   hdd 9.09560  1.00000 9.1 TiB 7.0 TiB 7.0 TiB  12 MiB   14 GiB 2.1 TiB 77.12 1.00 290     up

  205   hdd 9.09560  1.00000 9.1 TiB 6.7 TiB 6.7 TiB  11 MiB   13 GiB 2.4 TiB 73.89 0.96 280     up

  300   hdd 1.81929  1.00000 1.8 TiB 1.5 TiB 1.5 TiB 2.0 MiB  3.3 GiB 283 GiB 84.80 1.10  65     up

  301   hdd 1.81929  1.00000 1.8 TiB 1.5 TiB 1.5 TiB 2.8 MiB  3.4 GiB 298 GiB 84.00 1.09  64     up

  302   hdd 5.45789  1.00000 5.5 TiB 3.8 TiB 3.8 TiB 6.5 MiB  7.6 GiB 1.7 TiB 69.08 0.90 154     up

  303   hdd 5.45789  1.00000 5.5 TiB 4.4 TiB 4.4 TiB 6.9 MiB  8.7 GiB 1.0 TiB 80.97 1.05 182     up

  304   hdd 9.09560  1.00000 9.1 TiB 7.5 TiB 7.5 TiB  12 MiB   14 GiB 1.6 TiB 82.39 1.07 311     up

  305   hdd 9.09560  1.00000 9.1 TiB 7.1 TiB 7.0 TiB  11 MiB   14 GiB 2.0 TiB 77.54 1.01 295     up

  400   hdd 1.81929  1.00000 1.8 TiB 1.2 TiB 1.2 TiB 2.3 MiB  2.9 GiB 596 GiB 68.02 0.88  53     up

  401   hdd 1.81929  1.00000 1.8 TiB 1.5 TiB 1.5 TiB 3.1 MiB  3.4 GiB 292 GiB 84.33 1.09  63     up

  402   hdd 5.45789  1.00000 5.5 TiB 4.2 TiB 4.2 TiB 7.6 MiB  8.3 GiB 1.2 TiB 77.33 1.00 171     up

  403   hdd 5.45789  1.00000 5.5 TiB 4.1 TiB 4.1 TiB 6.3 MiB  8.1 GiB 1.4 TiB 74.97 0.97 175     up

  404   hdd 9.09560  1.00000 9.1 TiB 7.9 TiB 7.8 TiB  11 MiB   15 GiB 1.2 TiB 86.38 1.12 321     up

  405   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  11 MiB   14 GiB 2.2 TiB 76.26 0.99 144     up

  500   hdd 1.81929  1.00000 1.8 TiB 1.3 TiB 1.3 TiB 2.3 MiB  3.0 GiB 516 GiB 72.32 0.94  53     up

  501   hdd 1.81929  1.00000 1.8 TiB 1.5 TiB 1.5 TiB 2.7 MiB  3.4 GiB 308 GiB 83.46 1.08  63     up

  502   hdd 5.45789  1.00000 5.5 TiB 4.7 TiB 4.6 TiB 5.3 MiB  9.2 GiB 820 GiB 85.32 1.11 192     up

  503   hdd 5.45789  1.00000 5.5 TiB 3.9 TiB 3.9 TiB 6.4 MiB  7.8 GiB 1.5 TiB 71.61 0.93 165     up

  504   hdd 9.09560  1.00000 9.1 TiB 7.5 TiB 7.5 TiB  12 MiB   14 GiB 1.6 TiB 82.31 1.07 311     up

  505   hdd 9.09560  1.00000 9.1 TiB 6.9 TiB 6.9 TiB  11 MiB   14 GiB 2.2 TiB 75.78 0.98 284     up

10100   ssd 0.40630  1.00000 416 GiB  40 GiB  39 GiB  19 MiB 1005 MiB 376 GiB  9.70 0.13  35     up

10101   ssd 0.40630  1.00000 416 GiB  44 GiB  43 GiB  39 MiB  985 MiB 372 GiB 10.62 0.14  37     up

10200   ssd 0.40630  1.00000 416 GiB  33 GiB  32 GiB  17 MiB 1007 MiB 383 GiB  7.92 0.10  36     up

10201   ssd 0.40630  1.00000 416 GiB  32 GiB  31 GiB  33 MiB  991 MiB 384 GiB  7.67 0.10  37     up

10300   ssd 0.40630  1.00000 416 GiB  41 GiB  40 GiB  30 MiB  994 MiB 375 GiB  9.80 0.13  36     up

10301   ssd 0.40630  1.00000 416 GiB  43 GiB  42 GiB  37 MiB  987 MiB 373 GiB 10.37 0.13  37     up

10400   ssd 0.40630  1.00000 416 GiB  34 GiB  33 GiB  24 MiB 1000 MiB 382 GiB  8.27 0.11  35     up

10401   ssd 0.40630  1.00000 416 GiB  40 GiB  39 GiB  30 MiB  994 MiB 376 GiB  9.69 0.13  36     up

10500   ssd 0.40630  1.00000 416 GiB  38 GiB  37 GiB  30 MiB  994 MiB 378 GiB  9.15 0.12  37     up

10501   ssd 0.40630  1.00000 416 GiB  46 GiB  45 GiB  31 MiB  993 MiB 370 GiB 11.08 0.14  37     up

                       TOTAL 168 TiB 129 TiB 129 TiB 493 MiB  267 GiB  38 TiB 77.15

MIN/MAX VAR: 0.10/1.14  STDDEV: 34.38

 

 

Regards

David Herselman

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux