Is there a way to force OSDs to remove old data?
Hi
After I recreated one OSD + increased pg count of my erasure-coded (2+1)
pool (which was way too low, only 100 for 9 osds) the cluster started to
eat additional disk space.
First I thought that was caused by the moved PGs using additional space
during unfinished backfills. I pinned most of new PGs to old OSDs via
`pg-upmap` and indeed it freed some space in the cluster.
Then I reduced osd_max_backfills to 1 and started to remove upmap pins
in small portions which allowed Ceph to finish backfills for these PGs.
HOWEVER, used capacity still grows! It drops after moving each PG, but
still grows overall.
It has grown +1.3TB yesterday. In the same period of time clients have
written only ~200 new objects (~800 MB, there are RBD images only).
Why, what's using such big amount of additional space?
Graphs from our prometheus are attached. Only ~200 objects were created
by RBD clients yesterday, but used raw space increased +1.3 TB.
Additional question is why ceph df / rados df tells there is only 16 TB
actual data written, but it uses 29.8 TB (now 31 TB) of raw disk space.
Shouldn't it be 16 / 2*3 = 24 TB ?
ceph df output:
[root@sill-01 ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
38 TiB 6.9 TiB 32 TiB 82.03
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
ecpool_hdd 13 16 TiB 93.94 1.0 TiB 7611672
rpool_hdd 15 9.2 MiB 0 515 GiB 92
fs_meta 44 20 KiB 0 515 GiB 23
fs_data 45 0 B 0 1.0 TiB 0
How to heal it?
--
С наилучшими пожеланиями,
Виталий Филиппов
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com