Re: [PATCH] VFS: Unlink should revoke all outstanding leases on file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 20, 2010 at 05:46:06PM +0800, Mi Jinlong wrote:
> J. Bruce Fields :
> > I don't know of any existing lock that does exactly what we want.
> > 
> > Somebody at citi worked on a better lease implementation for a while,
> > but I don't think we ever really got it right; the last version I can
> > find is here:
> > 
> > 	git://linux-nfs.org/~bfields/linux-topics.git leases
> 
>   When reading the code of the git, i found a patch which try to fix
>   the lease's problem, but only for unlink.

It's 8 patches together:

>   commit id: d5a08e556116c66fb60a448f805a40bf54314634
>         msg: "leases: break file leases on unlink."
> 
>   In this patch, it seems only add break_lease() and some other functions
>   but seems don't avoid the problem of race.

Look again: break_lease() is there, but there's also a break_lease_end()
after the unlink.

> Or there is some different 
>   at break_lease() with the community's kernel.
> 
>   Can you give me some message about the new lease? Thanks.

So the 8 patches at that branch are:
	leases: introduce per-inode lease enabling/disabling
	VFS: clean up extra dereferences in do_unlinkat()
	leases: break file leases on unlink.
	leases: break file leases on rename.
	leases: break leases on chown.
	VFS: refactor sys_fchmod and sys_fchmodat
	leases: break leases on chmod.
	leases: break leases on link.

Like I say, I don't think they're correct or I'd just copy them all to
the list.  But maybe the comment on the first patch (appended) is
useful.

--b.

leases: introduce per-inode lease enabling/disabling

The current file lease implementation is inadequate (for the purposes of
nfs, and, we believe, for the purposes of Samba), in at least two ways:

	- Leases are broken only conflicting opens; but both nfsv4
	  delegations and (we're told) Windows op locks actually require
	  that leases be broken on any operation that changes file
	  metadata--including unlink, link, rename, chmod, and chown.

	- The internal kernel api used for lease-breaking is inherently
	  racy, consisting as it does of a single break_lease() call.
	  (Consider this scenario: a file is not currently open and is
	  about to be unlinked.  During unlink processing, a lookup is
	  done, and break_lease() is called.  After the break_lease(),
	  but before the unlink completes, another user opens the file
	  and gets a read lease.  The unlink then completes, but the
	  other user thinks their read lease is still valid.  This
	  situation would be avoided if lease-granting for the inode
	  were disabled for the duration of the unlink.)

We're primarily interested in the case of read leases for now.  (Write
leases, which also must be broken on *access* to a file, are more
difficult to get completely right, and aren't used by the current nfs
server.)

Fixing the second problem requires replacing break_lease() by a pair of
calls, here called break_lease() and break_lease_end(), between which
new leases are temporarily prohibited.

We want to implement that temporary prohibition in a simple way that has
low impact in common (uncontended) cases.

This patch adds a field, i_leasecount, which provides mutual exclusion
between inode-modifying operations and read leases in the same way the
i_writecount provides mutual exclusion between write opens and execs:
when i_leasecount is positive, it counts the number of leases on the
given inode, and when it's negative it counts the number of operations
which want leases temporarily disabled.  This allows selective
enabling/disabling of leases on a per-inode basis.

To that end, the functions leases_get_access() and leases_put_access()
are used when a lease is granted and returned, respectively.  The
functions leases_deny_access() and leases_allow_access() are used to
prevent races between breaking-with-FMODE_WRITE and write-lease-granting
for the entire duration of a file operation.  Currently, leases are
broken only when a file is opened or truncated; these functions will
allow leases to be broken on things like unlink and rename as well.
NFSv4 implements delegations using leases, and needs its delegations to
be revoked on unlinks, renames, chowns, etc.

Note that this patch changes break_lease() and __break_lease(), such
that when they are called with FMODE_WRITE and return successfully, they
will leave leases disabled on the inode in question, and the caller must
eventually call break_lease_end() to re-enable leasing.  As alluded to
in the scenario above, this behavior isn't necessary when breaking
without FMODE_WRITE: existing and new read-leases wouldn't need to be
revoked or blocked; and a write-lease-granting setlease won't race the
break_lease() because the latter is presumed to have been preceded by
something like a dget() on the dentry in question (where d_count or
i_count > 1 blocks write-lease-granting).

This patch also closes a small existing open/lease race: a lease-related
race exists between the time that outstanding leases are broken (by
may_open()) and the time that, e.g., O_RDWR or O_WRONLY are reflected in
the inode's i_writecount variable (which will prevent subsequent
lease-granting setlease calls).  Conceivably, a read lease could be
granted in the interim.

To deal with this, may_open() is modified so that, on success and when
called with FMODE_WRITE, it will return with lease-granting disabled for
the inode in question.  do_filp_open() is modified so that leasing is
re-enabled once everything is finished.  Analogous changes are made on
truncation.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux