On 10-01-2023 18:59, Fox, Kevin M wrote:
What else is going on? (ceph -s). If there is a lot of data being shuffled around, it may just be because its waiting for some other actions to complete first.
There's a bit going on but if it is waiting for something else it
shouldn't be backfill_toofull but something else? There's plenty of
space and no disk near anything full.
"
# ceph -s
cluster:
id: 3b7736c6-00e4-11ec-a3c5-3cecef467984
health: HEALTH_WARN
3 host(s) running different kernel versions
1 failed cephadm daemon(s)
Low space hindering backfill (add storage if this doesn't
resolve itself): 2 pgs backfill_toofull
223 pgs not deep-scrubbed in time
134 pgs not scrubbed in time
1 pools have too many placement groups
services:
mon: 5 daemons, quorum
test-ceph-03,test-ceph-04,dcn-ceph-03,dcn-ceph-02,dcn-ceph-04 (age 2m)
mgr: dcn-ceph-02.jahyzc(active, since 2d), standbys:
dcn-ceph-03.lrhaxo
mds: 1/1 daemons up, 1 standby
osd: 122 osds: 121 up (since 6d), 121 in (since 8d); 98
remapped pgs
rbd-mirror: 2 daemons active (2 hosts)
data:
volumes: 1/1 healthy
pools: 9 pools, 6433 pgs
objects: 230.41M objects, 317 TiB
usage: 688 TiB used, 729 TiB / 1.4 PiB avail
pgs: 4056139/915969556 objects misplaced (0.443%)
6316 active+clean
88 active+remapped+backfill_wait
18 active+clean+scrubbing+deep
8 active+remapped+backfilling
2 active+remapped+backfill_wait+backfill_toofull
1 active+clean+snaptrim
io:
client: 812 KiB/s rd, 178 MiB/s wr, 609 op/s rd, 339 op/s wr
recovery: 96 MiB/s, 49 objects/s
progress:
Global Recovery Event (2d)
[===========================.] (remaining: 59m)
"
Mvh.
Torkil
Thanks,
Kevin
________________________________________
From: Torkil Svensgaard <torkil-BAyU0AAS9Wk@xxxxxxxxxxxxxxxx>
Sent: Tuesday, January 10, 2023 2:36 AM
To: ceph-users-a8pt6IJUokc@xxxxxxxxxxxxxxxx
Cc: Ruben Vestergaard
Subject: 2 pgs backfill_toofull but plenty of space
Check twice before you click! This email originated from outside PNNL.
Hi
Ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy
(stable)
Looking at this:
"
Low space hindering backfill (add storage if this doesn't resolve
itself): 2 pgs backfill_toofull
"
"
[WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if
this doesn't resolve itself): 2 pgs backfill_toofull
pg 3.11f is active+remapped+backfill_wait+backfill_toofull, acting
[98,51,39,100]
pg 3.74c is active+remapped+backfill_wait+backfill_toofull, acting
[96,120,58,48]
"
But the disks are noway near being full as far as I can determine, so
why backfill_toofull? The PGs in question are in the rbd_data pool.
"
# ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.4 PiB 730 TiB 686 TiB 686 TiB 48.46
ssd 1.3 TiB 1.2 TiB 162 GiB 162 GiB 12.11
TOTAL 1.4 PiB 731 TiB 686 TiB 686 TiB 48.42
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 1.1 GiB 273 545 MiB 0.05 549 GiB
rbd_data 3 4096 294 TiB 78.56M 450 TiB 45.72 267 TiB
rbd 4 32 4.1 MiB 26 3.5 MiB 0 549 GiB
rbd_internal 5 32 54 KiB 16 172 KiB 0 549 GiB
cephfs_data 6 2048 127 TiB 148.64M 229 TiB 29.99 267 TiB
cephfs_metadata 7 128 71 GiB 2.84M 142 GiB 11.46 549 GiB
libvirt 8 32 37 MiB 221 74 MiB 0 549 GiB
nfs-ganesha 9 32 2.7 KiB 7 52 KiB 0 366 GiB
.nfs 10 32 53 KiB 47 306 KiB 0 366 GiB
"
The top utilized disk is at 57% and the PGs in that pool are ~50GB.
"
TOP BOTTOM
USE WEIGHT PGS ID |USE WEIGHT PGS ID
--------------------------------+--------------------------------
57.71% 1.00000 54 osd.68 |46.60% 1.00000 286 osd.17
57.08% 1.00000 53 osd.80 |46.55% 1.00000 286 osd.99
54.95% 1.00000 70 osd.86 |46.48% 1.00000 284 osd.106
54.86% 1.00000 52 osd.63 |45.88% 1.00000 187 osd.27
54.06% 1.00000 68 osd.88 |45.81% 1.00000 279 osd.5
53.89% 1.00000 51 osd.79 |44.95% 1.00000 272 osd.13
53.65% 1.00000 51 osd.67 |43.63% 1.00000 269 osd.16
53.59% 1.00000 52 osd.65 |43.30% 1.00000 261 osd.12
53.58% 1.00000 51 osd.82 |32.17% 1.00000 172 osd.4
53.52% 1.00000 50 osd.72 |0% 0 0 osd.49
--------------------------------+--------------------------------
"
Mvh.
Torkil
--
Torkil Svensgaard
Systems Administrator
Danish Research Centre for Magnetic Resonance DRCMR, Section 714
Copenhagen University Hospital Amager and Hvidovre
Kettegaard Allé 30, 2650 Hvidovre, Denmark
_______________________________________________
ceph-users mailing list -- ceph-users-a8pt6IJUokc@xxxxxxxxxxxxxxxx
To unsubscribe send an email to ceph-users-leave-a8pt6IJUokc@xxxxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users-a8pt6IJUokc@xxxxxxxxxxxxxxxx
To unsubscribe send an email to ceph-users-leave-a8pt6IJUokc@xxxxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx