Re: HEALTH_WARN - Recovery Stuck?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I recently had a similar issue when reducing the number of PGs on a pool. A few OSDs became backfillful even though there was enough space; the OSDs were just not balanced well.

To fix, I reweighted the most-full OSDs:

ceph osd reweight-by-utilization 120

After it finished (~1 hour), I had fewer backfillful OSDs. I repeated this 2 more times, after which the OSDs were no longer backfillful and recovery data movement resumed.

Once the recovery was complete, I reweighted all OSDs back to 1.0, and all was fine.

--Mike

On 4/12/21 12:30 PM, Ml Ml wrote:
Hello,

i kind of ran out of disk space, so i added another host with osd.37.
But it does not seem to move much data on it. (85MB in 2h)

Any idea why the recovery process seems to be stuck? Should i fix the
4 backfillfull osds first? (by changing the weight)?

root@ceph01:~# ceph -s
   cluster:
     id:     5436dd5d-83d4-4dc8-a93b-60ab5db145df
     health: HEALTH_WARN
             4 backfillfull osd(s)
             9 nearfull osd(s)
             Low space hindering backfill (add storage if this doesn't
resolve itself): 1 pg backfill_toofull
             4 pool(s) backfillfull

   services:
     mon: 3 daemons, quorum ceph03,ceph01,ceph02 (age 12d)
     mgr: ceph03(active, since 4M), standbys: ceph02.jwvivm
     mds: backup:1 {0=backup.ceph06.hdjehi=up:active} 3 up:standby
     osd: 53 osds: 53 up (since 2h), 53 in (since 2h); 235 remapped pgs

   task status:
     scrub status:
         mds.backup.ceph06.hdjehi: idle

   data:
     pools:   4 pools, 1185 pgs
     objects: 24.69M objects, 45 TiB
     usage:   149 TiB used, 42 TiB / 191 TiB avail
     pgs:     5388809/74059569 objects misplaced (7.276%)
              950 active+clean
              232 active+remapped+backfill_wait
              2   active+remapped+backfilling
              1   active+remapped+backfill_wait+backfill_toofull

   io:
     recovery: 0 B/s, 171 keys/s, 16 objects/s

   progress:
     Rebalancing after osd.37 marked in (2h)
       [............................] (remaining: 6d)



root@ceph01:~# ceph health detail
HEALTH_WARN 4 backfillfull osd(s); 9 nearfull osd(s); Low space
hindering backfill (add storage if this doesn't resolve itself): 1 pg
backfill_toofull; 4 pool(s) backfillfull
[WRN] OSD_BACKFILLFULL: 4 backfillfull osd(s)
     osd.28 is backfill full
     osd.32 is backfill full
     osd.66 is backfill full
     osd.68 is backfill full
[WRN] OSD_NEARFULL: 9 nearfull osd(s)
     osd.11 is near full
     osd.24 is near full
     osd.27 is near full
     osd.39 is near full
     osd.40 is near full
     osd.42 is near full
     osd.43 is near full
     osd.45 is near full
     osd.69 is near full
[WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if
this doesn't resolve itself): 1 pg backfill_toofull
     pg 23.295 is active+remapped+backfill_wait+backfill_toofull,
acting [8,67,32]
[WRN] POOL_BACKFILLFULL: 4 pool(s) backfillfull
     pool 'backurne-rbd' is backfillfull
     pool 'device_health_metrics' is backfillfull
     pool 'cephfs.backup.meta' is backfillfull
     pool 'cephfs.backup.data' is backfillfull


root@ceph01:~# ceph osd df tree
ID   CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE  DATA     OMAP
META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
  -1         182.59897         -  191 TiB  149 TiB  149 TiB    35 GiB
503 GiB   42 TiB  77.96  1.00    -          root default
  -2          24.62473         -   29 TiB   22 TiB   22 TiB   5.0 GiB
80 GiB  7.1 TiB  75.23  0.96    -              host ceph01
   0    hdd    2.39999   1.00000  2.7 TiB  2.2 TiB  2.2 TiB   665 MiB
8.0 GiB  480 GiB  82.43  1.06   53      up          osd.0
   1    hdd    2.29999   1.00000  2.7 TiB  2.1 TiB  2.1 TiB   446 MiB
7.5 GiB  590 GiB  78.44  1.01   49      up          osd.1
   4    hdd    2.67029   0.91066  2.7 TiB  2.2 TiB  2.2 TiB   484 MiB
7.9 GiB  440 GiB  83.90  1.08   53      up          osd.4
   8    hdd    2.39999   1.00000  2.7 TiB  2.1 TiB  2.1 TiB   490 MiB
7.9 GiB  533 GiB  80.49  1.03   51      up          osd.8
  11    hdd    1.71660   1.00000  1.7 TiB  1.5 TiB  1.5 TiB   406 MiB
5.5 GiB  200 GiB  88.60  1.14   36      up          osd.11
  12    hdd    1.29999   1.00000  2.7 TiB  1.2 TiB  1.2 TiB   366 MiB
4.9 GiB  1.5 TiB  43.89  0.56   28      up          osd.12
  14    hdd    2.20000   1.00000  2.7 TiB  2.0 TiB  2.0 TiB   418 MiB
7.1 GiB  693 GiB  74.66  0.96   47      up          osd.14
  18    hdd    2.20000   1.00000  2.7 TiB  2.0 TiB  1.9 TiB   434 MiB
7.3 GiB  737 GiB  73.05  0.94   47      up          osd.18
  22    hdd    1.00000   1.00000  1.7 TiB  890 GiB  886 GiB   110 MiB
3.6 GiB  868 GiB  50.62  0.65   20      up          osd.22
  30    hdd    1.50000   1.00000  1.7 TiB  1.4 TiB  1.3 TiB   361 MiB
4.9 GiB  370 GiB  78.93  1.01   32      up          osd.30
  33    hdd    1.59999   0.97437  1.6 TiB  1.4 TiB  1.4 TiB   397 MiB
5.4 GiB  213 GiB  87.20  1.12   34      up          osd.33
  64    hdd    3.33789   0.89752  3.3 TiB  2.7 TiB  2.7 TiB   573 MiB
9.9 GiB  647 GiB  81.07  1.04   64      up          osd.64
  -3          26.79504         -   30 TiB   24 TiB   24 TiB   6.2 GiB
89 GiB  5.4 TiB  81.80  1.05    -              host ceph02
   2    hdd    1.50000   1.00000  1.7 TiB  1.4 TiB  1.4 TiB   363 MiB
5.3 GiB  359 GiB  79.58  1.02   32      up          osd.2
   3    hdd    2.50000   1.00000  2.7 TiB  2.2 TiB  2.2 TiB   647 MiB
7.8 GiB  469 GiB  82.85  1.06   53      up          osd.3
   7    hdd    2.00000   1.00000  2.7 TiB  1.8 TiB  1.8 TiB   453 MiB
7.0 GiB  848 GiB  69.00  0.89   43      up          osd.7
   9    hdd    2.67029   0.98323  2.7 TiB  2.4 TiB  2.3 TiB   709 MiB
8.8 GiB  322 GiB  88.21  1.13   57      up          osd.9
  13    hdd    1.79999   1.00000  2.4 TiB  1.7 TiB  1.6 TiB   410 MiB
6.5 GiB  747 GiB  69.41  0.89   40      up          osd.13
  16    hdd    2.50000   1.00000  2.7 TiB  2.2 TiB  2.2 TiB   637 MiB
7.8 GiB  458 GiB  83.26  1.07   53      up          osd.16
  19    hdd    1.39999   1.00000  1.7 TiB  1.3 TiB  1.3 TiB   345 MiB
5.1 GiB  465 GiB  73.53  0.94   30      up          osd.19
  23    hdd    2.00000   1.00000  2.7 TiB  1.9 TiB  1.9 TiB   442 MiB
7.7 GiB  738 GiB  73.02  0.94   43      up          osd.23
  24    hdd    1.71660   0.95634  1.7 TiB  1.5 TiB  1.5 TiB   426 MiB
5.8 GiB  187 GiB  89.37  1.15   36      up          osd.24
  28    hdd    2.70000   1.00000  2.7 TiB  2.5 TiB  2.4 TiB   712 MiB
8.4 GiB  219 GiB  92.00  1.18   58      up          osd.28
  31    hdd    2.67029   0.92993  2.7 TiB  2.3 TiB  2.3 TiB   465 MiB
8.1 GiB  393 GiB  85.62  1.10   54      up          osd.31
  32    hdd    3.33789   1.00000  3.3 TiB  3.0 TiB  3.0 TiB   693 MiB
11 GiB  306 GiB  91.06  1.17   71      up          osd.32
  -4          24.52005         -   26 TiB   21 TiB   21 TiB   5.0 GiB
79 GiB  5.1 TiB  80.51  1.03    -              host ceph03
   5    hdd    1.71660   1.00000  1.7 TiB  1.5 TiB  1.5 TiB   392 MiB
5.6 GiB  223 GiB  87.34  1.12   35      up          osd.5
   6    hdd    1.71660   1.00000  1.7 TiB  1.5 TiB  1.5 TiB   397 MiB
5.6 GiB  221 GiB  87.41  1.12   35      up          osd.6
  10    hdd    2.50000   0.97487  2.7 TiB  2.2 TiB  2.2 TiB   497 MiB
7.7 GiB  480 GiB  82.46  1.06   52      up          osd.10
  15    hdd    2.29999   1.00000  2.7 TiB  2.1 TiB  2.1 TiB   474 MiB
7.6 GiB  586 GiB  78.57  1.01   49      up          osd.15
  17    hdd    1.39999   1.00000  1.6 TiB  1.2 TiB  1.2 TiB   352 MiB
5.6 GiB  384 GiB  76.88  0.99   30      up          osd.17
  20    hdd    1.59999   1.00000  1.7 TiB  1.4 TiB  1.4 TiB   234 MiB
5.4 GiB  331 GiB  81.15  1.04   33      up          osd.20
  21    hdd    2.00000   1.00000  2.7 TiB  1.8 TiB  1.8 TiB   611 MiB
7.0 GiB  868 GiB  68.27  0.88   44      up          osd.21
  25    hdd    1.70000   0.92348  1.7 TiB  1.4 TiB  1.4 TiB   407 MiB
5.6 GiB  274 GiB  84.41  1.08   35      up          osd.25
  26    hdd    2.50000   1.00000  2.7 TiB  2.2 TiB  2.2 TiB   464 MiB
7.8 GiB  441 GiB  83.88  1.08   52      up          osd.26
  27    hdd    2.70000   0.95955  2.7 TiB  2.4 TiB  2.4 TiB   674 MiB
8.3 GiB  318 GiB  88.35  1.13   57      up          osd.27
  29    hdd    2.67029   0.73337  2.7 TiB  1.8 TiB  1.8 TiB   436 MiB
6.7 GiB  885 GiB  67.63  0.87   43      up          osd.29
  63    hdd    1.71660   1.00000  1.7 TiB  1.5 TiB  1.5 TiB   226 MiB
5.7 GiB  224 GiB  87.26  1.12   35      up          osd.63
-11          24.64297         -   25 TiB   21 TiB   21 TiB   4.9 GiB
66 GiB  3.4 TiB  86.48  1.11    -              host ceph04
  34    hdd    5.24519   0.85004  5.2 TiB  4.0 TiB  4.0 TiB  1002 MiB
13 GiB  1.2 TiB  76.37  0.98   97      up          osd.34
  42    hdd    5.24519   1.00000  5.2 TiB  4.7 TiB  4.7 TiB   1.1 GiB
15 GiB  545 GiB  89.86  1.15  113      up          osd.42
  44    hdd    7.00000   1.00000  7.2 TiB  6.3 TiB  6.3 TiB   1.4 GiB
19 GiB  901 GiB  87.70  1.12  150      up          osd.44
  45    hdd    7.15259   1.00000  7.2 TiB  6.5 TiB  6.4 TiB   1.5 GiB
19 GiB  718 GiB  90.20  1.16  154      up          osd.45
-13          30.04085         -   30 TiB   26 TiB   26 TiB   5.8 GiB
81 GiB  4.2 TiB  86.11  1.10    -              host ceph05
  39    hdd    7.15259   1.00000  7.2 TiB  6.4 TiB  6.4 TiB   1.5 GiB
19 GiB  751 GiB  89.74  1.15  153      up          osd.39
  40    hdd    7.15259   1.00000  7.2 TiB  6.4 TiB  6.4 TiB   1.3 GiB
19 GiB  767 GiB  89.53  1.15  153      up          osd.40
  41    hdd    7.15259   0.90002  7.2 TiB  5.8 TiB  5.7 TiB   1.2 GiB
18 GiB  1.4 TiB  80.54  1.03  138      up          osd.41
  43    hdd    5.24519   1.00000  5.2 TiB  4.7 TiB  4.7 TiB   1.1 GiB
15 GiB  574 GiB  89.32  1.15  113      up          osd.43
  60    hdd    3.33789   0.85780  3.3 TiB  2.6 TiB  2.6 TiB   685 MiB
8.9 GiB  754 GiB  77.93  1.00   62      up          osd.60
  -9          17.64297         -   18 TiB   13 TiB   13 TiB   3.0 GiB
43 GiB  4.4 TiB  74.85  0.96    -              host ceph06
  35    hdd    7.15259   0.80005  7.2 TiB  5.2 TiB  5.2 TiB   1.0 GiB
16 GiB  2.0 TiB  72.31  0.93  124      up          osd.35
  36    hdd    5.24519   0.85004  5.2 TiB  4.0 TiB  4.0 TiB   985 MiB
13 GiB  1.2 TiB  76.65  0.98   97      up          osd.36
  38    hdd    5.24519   0.85004  5.2 TiB  4.0 TiB  4.0 TiB   1.0 GiB
13 GiB  1.2 TiB  76.50  0.98   97      up          osd.38
-15          24.79565         -   25 TiB   22 TiB   22 TiB   4.7 GiB
66 GiB  3.1 TiB  87.64  1.12    -              host ceph07
  66    hdd    7.15259   1.00000  7.2 TiB  6.5 TiB  6.5 TiB   1.5 GiB
19 GiB  670 GiB  90.86  1.17  155      up          osd.66
  67    hdd    7.15259   0.91141  7.2 TiB  5.8 TiB  5.8 TiB   1.1 GiB
18 GiB  1.3 TiB  81.62  1.05  140      up          osd.67
  68    hdd    3.33789   1.00000  3.3 TiB  3.0 TiB  3.0 TiB   738 MiB
9.8 GiB  299 GiB  91.24  1.17   71      up          osd.68
  69    hdd    7.15259   1.00000  7.2 TiB  6.3 TiB  6.3 TiB   1.3 GiB
19 GiB  823 GiB  88.77  1.14  152      up          osd.69
-17           9.53670         -  9.5 TiB  1.4 GiB   85 MiB   473 MiB
832 MiB  9.5 TiB   0.01     0    -              host ceph08
  37    hdd    9.53670   1.00000  9.5 TiB  1.4 GiB   85 MiB   473 MiB
832 MiB  9.5 TiB   0.01     0    2      up          osd.37
                           TOTAL  191 TiB  149 TiB  149 TiB    35 GiB
503 GiB   42 TiB  77.96
MIN/MAX VAR: 0/1.18  STDDEV: 14.73
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux