Re: Ceph file system is not freeing space

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 11 Nov 2015 14:42:54 -0800

On Wed, Nov 11, 2015 at 2:28 PM, Eric Eastman
<eric.eastman@xxxxxxxxxxxxxx> wrote:
> On Wed, Nov 11, 2015 at 11:09 AM, John Spray <jspray@xxxxxxxxxx> wrote:
>> On Wed, Nov 11, 2015 at 5:39 PM, Eric Eastman
>> <eric.eastman@xxxxxxxxxxxxxx> wrote:
>>> I am trying to figure out why my Ceph file system is not freeing
>>> space.  Using Ceph 9.1.0 I created a file system with snapshots
>>> enabled, filled up the file system over days while taking snapshots
>>> hourly.  I then deleted all files and all snapshots, but Ceph is not
>>> returning the space. I left the cluster sit for two days to see if the
>>> cleanup process was being done in the background and it still has not
>>> freed the space. I tried rebooting the cluster and clients and the
>>> space is still not returned.
>>
>> Preface: snapshots are disabled by default for a reason -- we don't
>> have the test coverage for this stuff yet.
>>
>> Things to try:
>>  * Looking at MDS statistics (ceph daemon mds.foo perf dump) with
>> "stray" in the name to see if your inodes are stuck in stray state
>>  * Dumping MDS cache to see what it thinks about it, if you can see
>> references to the files that should have been deleted
>>
>> John
>>
>
> Hi John
>
> I know that I am playing in an area of the code that is not well
> tested, but snapshots are really cool :)
>
> Thank you for the pointer on where to look. Dumping the statistics
> shows there are a bunch of strays:
>
>   "mds_cache": {
>         "num_strays": 16389,
>         "num_strays_purging": 0,
>         "num_strays_delayed": 0,
>         "num_purge_ops": 0,
>         "strays_created": 17066,
>         "strays_purged": 677,
>         "strays_reintegrated": 0,
>         "strays_migrated": 0,
>         "num_recovering_processing": 0,
>         "num_recovering_enqueued": 0,
>         "num_recovering_prioritized": 0,
>         "recovery_started": 0,
>         "recovery_completed": 0
>     },
>
> The cache dump command:
>
> ceph mds tell \* dumpcache /tmp/dumpcache.txt
>
> Shows lots stays listed.  The top of the file shows:
>
> [inode 100000003e9 [...9b,head] ~mds0/stray1/100000003e9/ auth
> v2878259 snaprealm=0x557a6fa15200 dirtyparent f(v0 m2015-11-11
> 12:16:04.602163) n(v1 rc2015-11-11 12:16:04.602163 1=0+1) (inest lock)
> (iversion lock) | request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1
> dirty=1 waiter=0 authpin=0 0x557a71c3a450]
>  [dir 100000003e9 ~mds0/stray1/100000003e9/ [2,head] auth v=12
> cv=10/10 state=1610612738|complete f(v0 m2015-11-11 12:16:04.602163)
> n(v1 rc2015-11-11 12:16:04.602163) hs=0+1,ss=0+0 dirty=1 | child=1
> dirty=1 waiter=0 authpin=0 0x557a6fc9bc70]
>   [dentry #100/stray1/100000003e9/.ctdb.lock [9b,head] auth NULL
> (dversion lock) v=11 inode=0 | request=0 lock=0 inodepin=0 dirty=1
> authpin=0 clientlease=0 0x557a6fca0190]
> [inode 100006c0df6 [...9b,head] ~mds0/stray0/100006c0df6/ auth
> v2878733 snaprealm=0x557a71e74880 dirtyparent f(v0 m2015-11-08
> 20:44:28.955469) n(v0 rc2015-11-08 20:44:28.955469 1=0+1) (iversion
> lock) | dirfrag=1 openingsnapparents=0 dirtyparent=1 dirty=1
> 0x557a71507210]
>  [dir 100006c0df6 ~mds0/stray0/100006c0df6/ [2,head] auth v=90 cv=0/0
> state=1073741824 f(v0 m2015-11-08 20:44:28.955469) n(v0 rc2015-11-08
> 20:44:28.955469) hs=0+8,ss=0+0 dirty=8 | child=1 0x557a715a18e0]
>   [dentry #100/stray0/100006c0df6/data_file.43 [9b,head] auth NULL
> (dversion lock) v=75 inode=0 | dirty=1 0x557a72a25e00]
>   [dentry #100/stray0/100006c0df6/data_file.44 [9b,head] auth NULL
> (dversion lock) v=77 inode=0 | dirty=1 0x557a72a26120]
>   [dentry #100/stray0/100006c0df6/data_file.45 [9b,head] auth NULL
> (dversion lock) v=79 inode=0 | dirty=1 0x557a72a26440]
>   [dentry #100/stray0/100006c0df6/data_file.46 [9b,head] auth NULL
> (dversion lock) v=81 inode=0 | dirty=1 0x557a72a26760]
>   [dentry #100/stray0/100006c0df6/data_file.47 [9b,head] auth NULL
> (dversion lock) v=83 inode=0 | dirty=1 0x557a72a26a80]
>   [dentry #100/stray0/100006c0df6/data_file.48 [9b,head] auth NULL
> (dversion lock) v=85 inode=0 | dirty=1 0x557a72a26da0]
>   [dentry #100/stray0/100006c0df6/data_file.49 [9b,head] auth NULL
> (dversion lock) v=87 inode=0 | dirty=1 0x557a72a270c0]
>   [dentry #100/stray0/100006c0df6/data_file.50 [9b,head] auth NULL
> (dversion lock) v=89 inode=0 | dirty=1 0x557a72a273e0]
> [inode 100006c0dee [...9b,head] ~mds0/stray0/100006c0dee/ auth
> v2879339 snaprealm=0x557a7142e1c0 dirtyparent f(v0 m2015-11-08
> 20:44:28.956928) n(v5 rc2015-11-08 20:44:28.956928 1=0+1) (iversion
> lock) | dirfrag=1 dirtyparent=1 dirty=1 0x557a70c81518]
>  [dir 100006c0dee ~mds0/stray0/100006c0dee/ [2,head] auth v=93 cv=0/0
> state=1610612736 f(v0 m2015-11-08 20:44:28.956928) n(v5 rc2015-11-08
> 20:44:28.956928) hs=0+1,ss=0+0 dirty=1 | child=1 dirty=1
> 0x557a715a1608]
>
> If you or someone else is interested, the whole cache file can be down
> loaded at:
>
> wget ftp://ftp.keepertech.com/outgoing/eric/dumpcache.txt.bz2
>
> It is about 1.8 MB uncompressed.
>
> I know that snapshots are not being regularly tested, but do you want
> me to open a ticket on this issue and others I come across?

Yes please!

John, are you interested in digging into this (I see the dirt parent
et al bits that are probably causing issues) or should we get Zheng to
do it?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com