Re: regurlary 'no space left on device' when deleting on cephfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/09/2019 04:14, Yan, Zheng wrote:
On Wed, Sep 11, 2019 at 6:51 AM Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
We sync the file system without preserving hard links. But we take
snapshots after each sync, so I guess deleting files which are still in
snapshots can also be in the stray directories?

[root@mds02 ~]# ceph daemon mds.mds02 perf dump | grep -i 'stray\|purge'
      "finisher-PurgeQueue": {
          "num_strays": 990153,
          "num_strays_delayed": 32,
          "num_strays_enqueuing": 0,
          "strays_created": 753278,
          "strays_enqueued": 650603,
          "strays_reintegrated": 0,
          "strays_migrated": 0,


num_strays is indeed close to a million


The issue is related to snapshot. snap inodes stray in stray
directory. I suggest deleting some old snapshots

We only have a few snapshots, and they are not very old :) But deleting a few, waiting for the trim and restarting mds's reduced the num strays, so this fixes it temporary.

I've also made a ticket  https://tracker.ceph.com/issues/41778


Thanks!

Kenneth



On 10/09/2019 12:42, Burkhard Linke wrote:
Hi,


do you use hard links in your workload? The 'no space left on device'
message may also refer to too many stray files. Strays are either
files that are to be deleted (e.g. the purge queue), but also files
which are deleted, but hard links are still pointing to the same
content. Since cephfs does not use an indirect layer between inodes
and data, and the data chunks are named after the inode id, removing
the original file will leave stray entries since cephfs is not able to
rename the underlying rados objects.


There are 10 hidden directories for stray files, and given a maximum
size of 100.000 entries you can store only up to 1 million entries. I
don't know exactly how entries are distributed among the 10
directories, so the limit may be reached earlier for a single stray
directory. The performance counters contains some values for stray, so
they are easy to check. The daemonperf output also shows the current
value.


The problem of the upper limit of directory entries was solved by
directory fragmentation, so you should check whether fragmentation is
allowed in your filesystem. You can also try to increase the upper
directory entry limit, but this might lead to other problems (too
large rados omap objects....).


Regards,

Burkhard


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux