Hey Kenneth,
We encountered this when the number of strays (unlinked files yet to be purged) reached 1 million, which is a result of many many file removals happening on the fs repeatedly. It can also happen when there are more than 100k files in a dir with default settings.
You can tune it via 'mds_bal_fragment_size_max' setting on the mds either temporarily to rm files or permanently. Beware setting it too high.
Check num strays in mds cache by running `ceph daemon mds.{mds name} perf dump` and inspecting the mds cache section for num_strays. The 1 million limit is a multiple/function of the mds bal fragment size (10x).
Raf
On Fri, May 10, 2019, 9:03 PM Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> wrote:
Hi all,
I am seeing issues on cephfs running 13.2.5 when deleting files:
[root@osd006 ~]# rm /mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700
rm: remove regular empty file
‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’? y
rm: cannot remove
‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’: No space left
on device
few minutes later, I can remove it without problem. This happens
especially when there are a lot of files deleted somewhere on the
filesystem around the same time.
We already have tuned our mds config:
[mds]
mds_cache_memory_limit=10737418240
mds_log_max_expiring=200
mds_log_max_segments=200
mds_max_purge_files=2560
mds_max_purge_ops=327600
mds_max_purge_ops_per_pg=20
ceph -s is reporting everything clean, and the file system space usage
is less than 50%, also no full osds or anything.
Is there a way to further debug what the bottleneck is when removing
files that gives this 'no space left on device' error?
Thank you very much!
Kenneth
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com