Hi list,
we created a cluster for using cephfs with a kubernetes cluster. Since a
few weeks now the cluster keeps filling up at an alarming rate
(100 GB per day).
This is while the most relevant pg is deep scrubbing and was interupted
a few times.
We use about 150G (du using the mounted filesystem) on the cephfs
filesystem and try not to use snapshots (.snap directories "exist" but
are empty).
We do not understand why the pgs get bigger and bigger while cephfs
stays about the same size (overwrites on files certainly happen).
I suspect some snapshots mechanism. Any ideas how to debug this to stop it?
Maybe we should try to speed up the deep scrubbing somehow?
ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific
(stable)
cluster:
id: ece0290c-cd32-11ec-a0e2-005056a9dd02
health: HEALTH_WARN
1 MDSs report slow metadata IOs
3 nearfull osd(s)
13 pool(s) nearfull
services:
mon: 3 daemons, quorum
acdh-gluster-hdd3,acdh-gluster-hdd1,acdh-gluster-hdd2 (age 3d)
mgr: acdh-gluster-hdd3.kzsplh(active, since 5d), standbys:
acdh-gluster-hdd2.kiotbg, acdh-gluster-hdd1.ywgyfx
mds: 1/1 daemons up, 2 standby
osd: 3 osds: 3 up (since 4d), 3 in (since 7w)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 13 pools, 292 pgs
objects: 167.25M objects, 1.2 TiB
usage: 3.3 TiB used, 1.2 TiB / 4.5 TiB avail
pgs: 290 active+clean
1 active+clean+scrubbing+deep
1 active+clean+scrubbing
io:
client: 58 MiB/s rd, 3.6 MiB/s wr, 51 op/s rd, 148 op/s wr
rancher-ceph-fs - 227 clients
===============
RANK STATE MDS ACTIVITY DNS
INOS DIRS CAPS
0 active ceph-mds.acdh-gluster-hdd1.pqydya Reqs: 68 /s 793k
792k 102k 210k
POOL TYPE USED AVAIL
rancherFsPoolMetadata metadata 160G 329G
rancherFsPoolDefaultData data 2268k 329G
rancherFsPoolMainData data 2584G 658G
STANDBY MDS
ceph-mds.acdh-gluster-hdd2.zfleqe
ceph-mds.acdh-gluster-hdd3.etaobl
MDS version: ceph version 16.2.11
(3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)
(rancherFsPoolMainData is a 2+1 erasure encoded pool)
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 4.5 TiB 1.2 TiB 3.3 TiB 3.3 TiB 73.46
TOTAL 4.5 TiB 1.2 TiB 3.3 TiB 3.3 TiB 73.46
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED
MAX AVAIL
device_health_metrics 1 1 0 B 0 0 B 0
331 GiB
rancher-rbd-erasure 2 32 8.4 GiB 2.16k 13 GiB 1.25
661 GiB
rancher-rbd-meta 3 32 55 B 11 36 KiB 0
331 GiB
rancherFsPoolMetadata 4 32 53 GiB 5.18M 160 GiB 13.88
331 GiB
rancherFsPoolDefaultData 5 1 29 KiB 80.00M 2.2 MiB 0
331 GiB
rancherFsPoolMainData 6 1 1.7 TiB 82.08M 2.5 TiB 72.23
661 GiB
.rgw.root 7 32 1.3 KiB 4 48 KiB 0
331 GiB
default.rgw.log 8 32 3.6 KiB 209 408 KiB 0
331 GiB
default.rgw.control 9 32 0 B 8 0 B 0
331 GiB
default.rgw.meta 10 32 3.8 KiB 11 124 KiB 0
331 GiB
default.rgw.buckets.index 11 32 2.4 MiB 33 7.2 MiB 0
331 GiB
default.rgw.buckets.non-ec 12 32 0 B 0 0 B 0
331 GiB
default.rgw.buckets.data 14 1 55 GiB 16.57k 83 GiB 7.70
661 GiB
HEALTH_WARN 1 MDSs report slow metadata IOs; 3 nearfull osd(s); 13
pool(s) nearfull
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
mds.ceph-mds.acdh-gluster-hdd1.pqydya(mds.0): 100+ slow metadata
IOs are blocked > 30 secs, oldest blocked for 306 secs
[WRN] OSD_NEARFULL: 3 nearfull osd(s)
osd.0 is near full
osd.2 is near full
osd.3 is near full
[WRN] POOL_NEARFULL: 13 pool(s) nearfull
pool 'device_health_metrics' is nearfull
pool 'rancher-rbd-erasure' is nearfull
pool 'rancher-rbd-meta' is nearfull
pool 'rancherFsPoolMetadata' is nearfull
pool 'rancherFsPoolDefaultData' is nearfull
pool 'rancherFsPoolMainData' is nearfull
pool '.rgw.root' is nearfull
pool 'default.rgw.log' is nearfull
pool 'default.rgw.control' is nearfull
pool 'default.rgw.meta' is nearfull
pool 'default.rgw.buckets.index' is nearfull
pool 'default.rgw.buckets.non-ec' is nearfull
pool 'default.rgw.buckets.data' is nearfull
(near full is set to 0.66)
--
Mag. Ing. Omar Siam
Austrian Center for Digital Humanities and Cultural Heritage
Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences
Stellvertretende Behindertenvertrauensperson | Deputy representative for disabled persons
Bäckerstraße 13, 1010 Wien, Österreich | Vienna, Austria
T: +43 1 51581-7295
omar.siam@xxxxxxxxxx | www.oeaw.ac.at/acdh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx