Warning: 1 pool nearfull and unbalanced data distribution

Thomas <74cmonty@xxxxxxxxx> · Mon, 16 Sep 2019 08:37:57 +0200

Update:
I have been pointed to this
<https://docs.ceph.com/docs/master/rados/operations/upmap/> doku "Using
the pg-upmap" and I checked the offline optimization.

However the output for the relevant pool shows: no upmaps proposed
root@ld3955:/mnt/rbd# osdmaptool om --upmap out.txt --upmap-pool hdb_backup
osdmaptool: osdmap file 'om'
writing upmap command output to: out.txt
checking for upmap cleanups
upmap, max-count 100, max deviation 0.01
 limiting to pools hdb_backup (11)
no upmaps proposed

This is very strange because it is clear that this pool hdb_backup is
very much unbalanced.

Please advise.

THX

..........................................

Hi,
the output of ceph health details gives me a warning that concerns me a
little. I'll explain in a second.
root@ld3955:/mnt/rbd# ceph health detail
HEALTH_WARN 1 nearfull osd(s); 1 pool(s) nearfull; 4 pools have too many
placement groups
OSD_NEARFULL 1 nearfull osd(s)
    osd.122 is near full
POOL_NEARFULL 1 pool(s) nearfull
    pool 'hdb_backup' is nearfull
POOL_TOO_MANY_PGS 4 pools have too many placement groups
    Pool pve_cephfs_data has 128 placement groups, should have 16
    Pool hdd has 512 placement groups, should have 64
    Pool pve_cephfs_metadata has 32 placement groups, should have 4
    Pool backup has 1024 placement groups, should have 4

I'm writing +90% of the data in pool "hdb_backup", and this is ongoing.
Therefore I can hardly afford that this pool is full.

When I check the output of ceph osd status the relevand osd is somehow
overutilized:
root@ld3955:/mnt/rbd# ceph osd status | grep nearfull
+-----+--------+-------+-------+--------+---------+--------+---------+--------------------+
|  id |  host  |  used | avail | wr ops | wr data | rd ops | rd data
|       state        |
+-----+--------+-------+-------+--------+---------+--------+---------+--------------------+
| 122 | ld5505 | 1448G |  227G |    0   |     0   |    0   |     0 |
exists,nearfull,up |

This looks like an inconsistency, but I can check the usage with ceph
osd df.
Here I can see that the OSDs are not really balanced:
root@ld3955:/mnt/rbd# ceph osd df
ID  CLASS WEIGHT  REWEIGHT SIZE    RAW USE  DATA     OMAP META    
AVAIL    %USE  VAR  PGS STATUS
272   hdd 7.28000  1.00000 7.3 TiB  2.3 TiB  2.3 TiB 136 KiB  3.7 GiB 
5.0 TiB 31.23 0.77 121     up
273   hdd 7.28000  1.00000 7.3 TiB  2.9 TiB  2.9 TiB  28 KiB  5.1 GiB 
4.4 TiB 39.35 0.97 152     up
274   hdd 7.28000  1.00000 7.3 TiB  2.9 TiB  2.9 TiB 168 KiB  4.5 GiB 
4.4 TiB 39.41 0.97 152     up
275   hdd 7.28000  1.00000 7.3 TiB  2.2 TiB  2.2 TiB 139 KiB  3.5 GiB 
5.1 TiB 29.77 0.73 115     up
276   hdd 7.28000  1.00000 7.3 TiB  2.8 TiB  2.8 TiB  48 KiB  5.7 GiB 
4.5 TiB 38.81 0.96 150     up
277   hdd 7.28000  1.00000 7.3 TiB  2.5 TiB  2.5 TiB 276 KiB  4.3 GiB 
4.8 TiB 34.68 0.85 134     up
278   hdd 7.28000  1.00000 7.3 TiB  2.8 TiB  2.8 TiB  36 KiB  4.4 GiB 
4.5 TiB 38.74 0.95 150     up
279   hdd 7.28000  1.00000 7.3 TiB  2.6 TiB  2.6 TiB 156 KiB  4.1 GiB 
4.7 TiB 35.80 0.88 138     up
280   hdd 7.28000  1.00000 7.3 TiB  2.7 TiB  2.7 TiB 156 KiB  4.3 GiB 
4.6 TiB 37.03 0.91 143     up
281   hdd 7.28000  1.00000 7.3 TiB  2.4 TiB  2.4 TiB 172 KiB  3.8 GiB 
4.9 TiB 32.67 0.80 126     up
282   hdd 7.28000  1.00000 7.3 TiB  2.9 TiB  2.9 TiB 120 KiB  4.5 GiB 
4.4 TiB 39.39 0.97 152     up
283   hdd 7.28000  1.00000 7.3 TiB  2.7 TiB  2.7 TiB  32 KiB  5.9 GiB 
4.5 TiB 37.57 0.93 145     up
[...]
 76   hdd 1.64000  1.00000 1.6 TiB  1.4 TiB  1.4 TiB  88 KiB  2.4 GiB 
268 GiB 84.02 2.07  73     up
 77   hdd 1.64000  1.00000 1.6 TiB  1.1 TiB  1.1 TiB 188 KiB  2.0 GiB 
560 GiB 66.59 1.64  58     up
 78   hdd 1.64000  1.00000 1.6 TiB  1.0 TiB 1023 GiB 164 KiB  1.9 GiB 
651 GiB 61.15 1.51  53     up
 79   hdd 1.64000  1.00000 1.6 TiB  1.0 TiB  1.0 TiB 176 KiB  1.9 GiB 
636 GiB 62.02 1.53  54     up
 80   hdd 1.64000  1.00000 1.6 TiB  1.0 TiB  1.0 TiB  80 KiB  2.5 GiB 
636 GiB 62.07 1.53  54     up
 81   hdd 1.64000  1.00000 1.6 TiB  886 GiB  885 GiB 128 KiB  1.7 GiB 
790 GiB 52.89 1.30  46     up
 82   hdd 1.64000  1.00000 1.6 TiB  967 GiB  965 GiB 240 KiB  1.8 GiB 
709 GiB 57.70 1.42  50     up
 83   hdd 1.64000  1.00000 1.6 TiB  1.2 TiB  1.2 TiB  64 KiB  2.2 GiB 
420 GiB 74.94 1.85  65     up
 84   hdd 1.64000  1.00000 1.6 TiB  1.1 TiB  1.1 TiB 108 KiB  2.0 GiB 
597 GiB 64.37 1.59  56     up
 85   hdd 1.64000  1.00000 1.6 TiB  811 GiB  810 GiB 176 KiB  1.6 GiB 
865 GiB 48.42 1.19  42     up
 86   hdd 1.64000  1.00000 1.6 TiB  1.0 TiB  1.0 TiB  72 KiB  2.0 GiB 
613 GiB 63.43 1.56  55     up
 87   hdd 1.64000  1.00000 1.6 TiB  791 GiB  789 GiB  68 KiB  1.6 GiB 
885 GiB 47.17 1.16  41     up
 88   hdd 1.64000  1.00000 1.6 TiB  908 GiB  906 GiB 168 KiB  1.8 GiB 
768 GiB 54.18 1.33  47     up
[...]
113   hdd 1.64000  1.00000 1.6 TiB  1.3 TiB  1.3 TiB 100 KiB  3.0 GiB 
342 GiB 79.60 1.96  69     up
114   hdd 1.64000  1.00000 1.6 TiB 1001 GiB  999 GiB 184 KiB  1.9 GiB 
675 GiB 59.70 1.47  52     up
115   hdd 1.64000  1.00000 1.6 TiB  1.2 TiB  1.2 TiB 120 KiB  2.2 GiB 
407 GiB 75.70 1.86  66     up
116   hdd 1.64000  1.00000 1.6 TiB  1.1 TiB  1.1 TiB  92 KiB  2.0 GiB 
597 GiB 64.39 1.59  56     up
117   hdd 1.64000  1.00000 1.6 TiB  1.2 TiB  1.2 TiB  76 KiB  2.7 GiB 
480 GiB 71.34 1.76  62     up
118   hdd 1.64000  1.00000 1.6 TiB  1.1 TiB  1.1 TiB  48 KiB  2.6 GiB 
574 GiB 65.74 1.62  57     up
119   hdd 1.64000  1.00000 1.6 TiB  1.0 TiB  1.0 TiB 152 KiB  1.9 GiB 
634 GiB 62.19 1.53  54     up
120   hdd 1.64000  1.00000 1.6 TiB  1.1 TiB  1.1 TiB  48 KiB  2.0 GiB 
541 GiB 67.73 1.67  59     up
121   hdd 1.64000  1.00000 1.6 TiB  1.1 TiB  1.1 TiB  48 KiB  2.0 GiB 
556 GiB 66.82 1.65  58     up
122   hdd 1.64000  0.95001 1.6 TiB  1.4 TiB  1.4 TiB 184 KiB  2.5 GiB 
227 GiB 86.44 2.13  75     up

I assume that this is a result of my Ceph Cluster history, means I
started with 4 OSD nodes with 48 drives @1.8TB.
This was 345TB in total.
I started to fill this storage up to 75%.
Then I added 2 OSD nodes with 48 drives @8TB.

I was not expecting that pool "hdb_backup" would be filled up in the
near future.
This is the ouput of ceph df:
root@ld3955:/mnt/rbd# ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       1.1 PiB     661 TiB     467 TiB      468 TiB 41.43
    nvme       23 TiB      23 TiB     681 MiB      8.7 GiB 0.04
    TOTAL     1.1 PiB     685 TiB     467 TiB      468 TiB 40.59

POOLS:
    POOL                    ID     STORED      OBJECTS USED       
%USED     MAX AVAIL
    backup                   4         0 B           0         0
B         0        35 TiB
    hdb_backup              11     154 TiB      40.39M     154 TiB    
64.18        29 TiB
    hdd                     30     1.1 TiB     281.21k     1.1 TiB     
1.01        35 TiB
    pve_cephfs_data         32     318 GiB      91.83k     318 GiB     
0.30        35 TiB
    pve_cephfs_metadata     33     155 MiB          61     155
MiB         0        35 TiB
    nvme                    35         0 B           0         0
B         0       7.4 TiB

Question:
How can I start rebalancing data in order to have more data in the
larger drives (8TB)?
Or is it ok that the smaller drives (1.8TB) are filled by +60%?

THX for your advice
Thomas

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx