Hi all, we currently experience a few "strange" things on our Ceph cluster and I wanted to ask if anyone has recommendations for further tracking them down (or maybe even an explanation already ;) ) Ceph version is 0.94.5 and we have a HDD based pool with a cache pool on NVMe SSDs in front if it. ceph df detail lists a "used" size on the ssd pool (the cache) of currently 3815 GB. We have a replication size of 2, so effectively this should take around 7670 GB on disk. Duing a df on all OSDs and summing them up gives 8501 GB, which is 871 GB more than expected. Last week the difference was around 840 GB, the week before that around 780 GB. So it looks like the difference is constantly growing. Doing a for date in `ceph pg dump | grep active | awk '{print $20}'`; do date +%A -d $date; done | sort | uniq -c Returns 2002 Tuesday 1390 Wednesday So scrubbing and deepscrubbing is regularly done. A thing I noticed which might or might not be related is the following: The pool is used for OpenStack ephemeral disks and I had created a 1 TB VM (1TB ephemeral, not a cinder volume ;) ) I looked up the RBD device and noted down the block prefix name. > rbd info ephemeral-vms/0edd1080-9f84-48d2-8714-34b1cd7d50df_disk > rbd image '0edd1080-9f84-48d2-8714-34b1cd7d50df_disk': > size 1024 GB in 262144 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.2c383a0238e1f29 > format: 2 > features: layering > flags: After I had deleted the VM I regularly checked the amount of objects in rados via "rados -p ephemeral-vms ls | grep rbd_data.2c383a0238e1f29 | wc -l" and it still returns a large amount of objects: > Mon Sep 19 09:10:43 CEST 2016 - 138937 > Tue Sep 20 16:11:55 CEST 2016 - 135818 > Thu Sep 22 09:59:03 CEST 2016 - 135791 > Wed Sep 28 12:15:07 CEST 2016 - 133862 I did a "stat" AND a "rm" on each and every of those objects, but they all returned: > rados -p ephemeral-vms stat rbd_data.2c383a0238e1f29.000000000000f8b8 > error stat-ing ephemeral-vms/rbd_data.2c383a0238e1f29.000000000000f8b8: (2) No such file or directory So why is rados still return those objects via an ls? Even worse, counting the objects on the ssd pool I get: rados -p ssd ls | grep rbd_data.2c383a0238e1f29 | wc -l Wed Sep 28 12:54:07 CEST 2016 - 246681 I did a find on one of the OSDs data dir: > find . -name "*data.2c383a0238e1f29*" | wc -l > 33060 And checked a few, all of them very 0-byte files e.g. > ls -lha ./11.1d_head/DIR_D/DIR_1/DIR_0/DIR_7/DIR_9/rbd\\udata.2c383a0238e1f29.0000000000019bf7__head_87C9701D__b > -rw-r--r-- 1 root root 0 Sep 9 11:21 ./11.1d_head/DIR_D/DIR_1/DIR_0/DIR_7/DIR_9/rbd\udata.2c383a0238e1f29.0000000000019bf7__head_87C9701D__b But even a 0-byte file takes some space on the disk, might those be the reason? Any feedback welcome. Greetings -Sascha- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com