Re: [RFC] freeing unliked file indefinitely delayed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 08, 2015 at 02:42:38AM +0100, Al Viro wrote:
> 	Normally opening a file, unlinking it and then closing will have
> the inode freed upon close() (provided that it's not otherwise busy and
> has no remaining links, of course).  However, there's one case where that
> does *not* happen.  Namely, if you open it by fhandle with cold dcache,
> then unlink() and close().
> 
> 	In normal case you get d_delete() in unlink(2) notice that dentry
> is busy and unhash it; on the final dput() it will be forcibly evicted from
> dcache, triggering iput() and inode removal.  In this case, though, we end
> up with *two* dentries - disconnected (created by open-by-fhandle) and
> regular one (used by unlink()).  The latter will have its reference to inode
> dropped just fine, but the former will not - it's considered hashed (it
> is on the ->s_anon list), so it will stay around until the memory pressure
> will finally do it in.  As the result, we have the final iput() delayed
> indefinitely.  It's trivial to reproduce -
> 
> #define _GNU_SOURCE
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> 
> void flush_dcache(void)
> {
>         system("mount -o remount,rw /");
> }
> 
> static char buf[20 * 1024 * 1024];
> 
> main()
> {
>         int fd;
>         union { 
>                 struct file_handle f;
>                 char buf[MAX_HANDLE_SZ];
>         } x;
>         int m;
> 
>         x.f.handle_bytes = sizeof(x);
>         chdir("/root");
>         mkdir("foo", 0700);
>         fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
>         close(fd);
>         name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
>         flush_dcache();
>         fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
>         unlink("foo/bar");
>         write(fd, buf, sizeof(buf));
>         system("df .");			/* 20Mb eaten */
>         close(fd);
>         system("df .");			/* should've freed those 20Mb */
>         flush_dcache();
>         system("df .");			/* should be the same as #2 */
> }
> 
> will spit out something like
> Filesystem     1K-blocks   Used Available Use% Mounted on
> /dev/root         322023 303843      1131 100% /
> Filesystem     1K-blocks   Used Available Use% Mounted on
> /dev/root         322023 303843      1131 100% /
> Filesystem     1K-blocks   Used Available Use% Mounted on
> /dev/root         322023 283282     21692  93% /
> - inode gets freed only when dentry is finally evicted (here we trigger
> than by remount; normally it would've happened in response to memory
> pressure hell knows when).
> 
> IMO it's a bug.

Definitely agreed.

> Between the close() and final flush_dcache() the file has
> no surviving links, is *not* busy, won't show up in fuser/lsof/whatnot
> output, and yet it's still not freed.  I'm not saying that it's realistically
> exploitable (albeit with nfsd it might be)

I'd be surprised if it doesn't happen.  This, for example, is a "space
on nfs export not freed when I expected it after an unlink" report that
could easily have the same cause:

	https://www.redhat.com/archives/linux-cluster/2015-May/msg00000.html

(Not sure, I'll get back to them and see if they can confirm.)

> but it's a very unpleasant self-LART.

> FWIW, my prefered fix would be simply to have the final dput() treat
> disconnected dentries as "kill on sight"; checking for i_nlink of the
> inode, as Bruce suggested several years ago, will *not* work, simply
> because having another link to that file and unlinking it after close
> will reproduce the situation - disconnected dentry sticks around in
> dcache past its final dput() and past the last unlink() of our file.
> Theoretically it might cause an overhead for nfsd (no_subtree_check v3
> export might see more d_alloc()/d_free(); icache retention will still
> prevent constant rereading the inode from disk).  _IF_ that proves to
> be noticable, we might need to come up with something more elaborate
> (e.g. have unlink() and rename() kick disconnected aliases out if the link
> count has reached 0), but it's more complex and needs careful ananlysis
> of correctness - we need to prove that there's no way to miss the link
> count reaching 0.  I would prefer to treat all disconnected as unhashed
> for dcache retention purposes - it's simpler and less brittle.  Comments?
> I mean something like this:

ACK from me.

--b.

> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 7a3f3e5..5c8ea15 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -642,7 +642,7 @@ static inline bool fast_dput(struct dentry *dentry)
>  
>  	/*
>  	 * If we have a d_op->d_delete() operation, we sould not
> -	 * let the dentry count go to zero, so use "put__or_lock".
> +	 * let the dentry count go to zero, so use "put_or_lock".
>  	 */
>  	if (unlikely(dentry->d_flags & DCACHE_OP_DELETE))
>  		return lockref_put_or_lock(&dentry->d_lockref);
> @@ -697,7 +697,7 @@ static inline bool fast_dput(struct dentry *dentry)
>  	 */
>  	smp_rmb();
>  	d_flags = ACCESS_ONCE(dentry->d_flags);
> -	d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST;
> +	d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_DISCONNECTED;
>  
>  	/* Nothing to do? Dropping the reference was all we needed? */
>  	if (d_flags == (DCACHE_REFERENCED | DCACHE_LRU_LIST) && !d_unhashed(dentry))
> @@ -776,6 +776,9 @@ repeat:
>  	if (unlikely(d_unhashed(dentry)))
>  		goto kill_it;
>  
> +	if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
> +		goto kill_it;
> +
>  	if (unlikely(dentry->d_flags & DCACHE_OP_DELETE)) {
>  		if (dentry->d_op->d_delete(dentry))
>  			goto kill_it;
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux