On Wed, Nov 11, 2015 at 11:09 AM, John Spray <jspray@xxxxxxxxxx> wrote: > On Wed, Nov 11, 2015 at 5:39 PM, Eric Eastman > <eric.eastman@xxxxxxxxxxxxxx> wrote: >> I am trying to figure out why my Ceph file system is not freeing >> space. Using Ceph 9.1.0 I created a file system with snapshots >> enabled, filled up the file system over days while taking snapshots >> hourly. I then deleted all files and all snapshots, but Ceph is not >> returning the space. I left the cluster sit for two days to see if the >> cleanup process was being done in the background and it still has not >> freed the space. I tried rebooting the cluster and clients and the >> space is still not returned. > > Preface: snapshots are disabled by default for a reason -- we don't > have the test coverage for this stuff yet. > > Things to try: > * Looking at MDS statistics (ceph daemon mds.foo perf dump) with > "stray" in the name to see if your inodes are stuck in stray state > * Dumping MDS cache to see what it thinks about it, if you can see > references to the files that should have been deleted > > John > Hi John I know that I am playing in an area of the code that is not well tested, but snapshots are really cool :) Thank you for the pointer on where to look. Dumping the statistics shows there are a bunch of strays: "mds_cache": { "num_strays": 16389, "num_strays_purging": 0, "num_strays_delayed": 0, "num_purge_ops": 0, "strays_created": 17066, "strays_purged": 677, "strays_reintegrated": 0, "strays_migrated": 0, "num_recovering_processing": 0, "num_recovering_enqueued": 0, "num_recovering_prioritized": 0, "recovery_started": 0, "recovery_completed": 0 }, The cache dump command: ceph mds tell \* dumpcache /tmp/dumpcache.txt Shows lots stays listed. The top of the file shows: [inode 100000003e9 [...9b,head] ~mds0/stray1/100000003e9/ auth v2878259 snaprealm=0x557a6fa15200 dirtyparent f(v0 m2015-11-11 12:16:04.602163) n(v1 rc2015-11-11 12:16:04.602163 1=0+1) (inest lock) (iversion lock) | request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x557a71c3a450] [dir 100000003e9 ~mds0/stray1/100000003e9/ [2,head] auth v=12 cv=10/10 state=1610612738|complete f(v0 m2015-11-11 12:16:04.602163) n(v1 rc2015-11-11 12:16:04.602163) hs=0+1,ss=0+0 dirty=1 | child=1 dirty=1 waiter=0 authpin=0 0x557a6fc9bc70] [dentry #100/stray1/100000003e9/.ctdb.lock [9b,head] auth NULL (dversion lock) v=11 inode=0 | request=0 lock=0 inodepin=0 dirty=1 authpin=0 clientlease=0 0x557a6fca0190] [inode 100006c0df6 [...9b,head] ~mds0/stray0/100006c0df6/ auth v2878733 snaprealm=0x557a71e74880 dirtyparent f(v0 m2015-11-08 20:44:28.955469) n(v0 rc2015-11-08 20:44:28.955469 1=0+1) (iversion lock) | dirfrag=1 openingsnapparents=0 dirtyparent=1 dirty=1 0x557a71507210] [dir 100006c0df6 ~mds0/stray0/100006c0df6/ [2,head] auth v=90 cv=0/0 state=1073741824 f(v0 m2015-11-08 20:44:28.955469) n(v0 rc2015-11-08 20:44:28.955469) hs=0+8,ss=0+0 dirty=8 | child=1 0x557a715a18e0] [dentry #100/stray0/100006c0df6/data_file.43 [9b,head] auth NULL (dversion lock) v=75 inode=0 | dirty=1 0x557a72a25e00] [dentry #100/stray0/100006c0df6/data_file.44 [9b,head] auth NULL (dversion lock) v=77 inode=0 | dirty=1 0x557a72a26120] [dentry #100/stray0/100006c0df6/data_file.45 [9b,head] auth NULL (dversion lock) v=79 inode=0 | dirty=1 0x557a72a26440] [dentry #100/stray0/100006c0df6/data_file.46 [9b,head] auth NULL (dversion lock) v=81 inode=0 | dirty=1 0x557a72a26760] [dentry #100/stray0/100006c0df6/data_file.47 [9b,head] auth NULL (dversion lock) v=83 inode=0 | dirty=1 0x557a72a26a80] [dentry #100/stray0/100006c0df6/data_file.48 [9b,head] auth NULL (dversion lock) v=85 inode=0 | dirty=1 0x557a72a26da0] [dentry #100/stray0/100006c0df6/data_file.49 [9b,head] auth NULL (dversion lock) v=87 inode=0 | dirty=1 0x557a72a270c0] [dentry #100/stray0/100006c0df6/data_file.50 [9b,head] auth NULL (dversion lock) v=89 inode=0 | dirty=1 0x557a72a273e0] [inode 100006c0dee [...9b,head] ~mds0/stray0/100006c0dee/ auth v2879339 snaprealm=0x557a7142e1c0 dirtyparent f(v0 m2015-11-08 20:44:28.956928) n(v5 rc2015-11-08 20:44:28.956928 1=0+1) (iversion lock) | dirfrag=1 dirtyparent=1 dirty=1 0x557a70c81518] [dir 100006c0dee ~mds0/stray0/100006c0dee/ [2,head] auth v=93 cv=0/0 state=1610612736 f(v0 m2015-11-08 20:44:28.956928) n(v5 rc2015-11-08 20:44:28.956928) hs=0+1,ss=0+0 dirty=1 | child=1 dirty=1 0x557a715a1608] If you or someone else is interested, the whole cache file can be down loaded at: wget ftp://ftp.keepertech.com/outgoing/eric/dumpcache.txt.bz2 It is about 1.8 MB uncompressed. I know that snapshots are not being regularly tested, but do you want me to open a ticket on this issue and others I come across? Thanks, Eric _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com