Thanks, Greg. After sending my original email I kept digging and came across this ML post [1], so I confirmed with the user and he was in fact using hardlinks. The discrepancy between "ls" and "ceph df" was throwing me off, but your previous explanation about hardlinks aligned with what I was seeing. Once I found and removed those hardlinks, I got the space back on the cluster and observed num_strays drop significantly. Appreciate the help! [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/013621.html Josh On Wed, Sep 25, 2019 at 6:14 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > On Mon, Sep 23, 2019 at 6:50 AM Josh Haft <paccrap@xxxxxxxxx> wrote: > > > > Hi, > > > > I've been migrating data from one EC pool to another EC pool: two > > directories are mounted with ceph.dir.layout.pool file attribute set > > appropriately, then rsync from old to new and finally, delete the old > > files. I'm using the kernel client to do this. While the removed files > > are no longer present on the filesystem, they still appear to be > > accounted for via "ceph df". > > > > When I tally up the sizes reported by "ls -lh" on all subdirectories > > under the root CephFS using a FUSE client mount (except for those on > > the new EC pool), it totals just under 2PiB. However, "ceph df" shows > > the original EC pool as 2.5PiB used. I've copied + deleted > > approximately 545TiB so far, so it seems like the unlinked files > > aren't being fully released/purged. > > 1) Is the pool size decreasing at all? Is this mismatch in sizes new? > I wouldn't expect a 25% cost but there is probably some space overhead > that RADOS will be able to report. > > > I've only observed the num_strays counter from "ceph daemon mds.$name > > perf dump" for a few days now since I first suspected an issue, but > > I've never seen it drop below roughly 310k. From other ML postings > > I've gathered that stat has something to do with files pending > > deletion, but I'm not positive. > > "Strays" are inodes which have been unlinked from their place in the > tree but not yet deleted. This might be deleted files, but it can also > happen in some cases with hard links where you remove the original > location. > In your case it's probably files which have been deleted, but haven't > been removed from the FS yet because your clients still hold > references to them. > > > > > So far all I've done is restart the mds and mon daemons, which hasn't > > helped. What are the next steps for troubleshooting? I can turn up mds > > debug logging, but am not sure what to look for. > > Unmounting the FS from the clients also helps sometimes, as they may > hold references to deleted nodes that keep them from being deleted. > -Greg > > > > > Thanks for your help! > > Josh > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx