Re: [PATCH v3 00/13] locks: saner method for managing file locks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2 Feb 2015 15:29:33 -0500
Mike Marshall <hubcap@xxxxxxxxxxxx> wrote:

> I applied Jeff's patch to my orangefs-3.19-rc5 tree. Orangefs
> doesn't yet support the lock callout for file_operations, but
> we have been experimenting with some ideas that would allow
> Orangefs to honor locks in our distributed environment: basically
> posix locks for each kernel client plus meta data in our userspace
> servers.
> 
> Anyhow, the lock callout in the Orangefs patch I've posted
> just returns ENOSYS. I have added posix locks in a current
> test, so now I have:
> 
> int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
> {
>         int rc;
> 
>         rc = posix_lock_file(filp, fl, NULL);
> 
>         return rc;
> }
> 
> So... while this isn't safe for our real world distributed environment,
> it makes the kernel with this version of the Orangefs kernel client
> loaded on it think that locks work.
> 
> Besides observing that locks seem to work by running some programs
> that need locks (like svn/sqlite) I also ran xfstest generic/131.
> 
> Without Jeff's patch, xfstest generic/131 fails:
> 
>   29:Verify that F_GETLK for F_WRLCK doesn't require that file be
>      opened for write
> 
> Same with Jeff's patch.
> 
> Jeff's patch applied painlessly, and my simple tests didn't uncover
> any problems with Jeff's implementation of separate lists for different
> lock types, so that's good.
> 
> What surprised me, though, is that the posix lock code failed
> to get all the way through xfstest generic/131... maybe you
> all knew that already?
> 
> -Mike
> 

Hmm... I haven't seen 131 fail like this, but your code above looks
wrong. You're ignoring the "cmd" argument, so even F_GETLK requests are
setting a lock.

I think you might want to do something like:

int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
{
	if (cmd == F_GETLK)
		return posix_test_lock(filp, fl);
	return posix_lock_file(filp, fl, fl);
}

...untested of course, but you get the idea.


> On Thu, Jan 22, 2015 at 9:27 AM, Jeff Layton
> <jeff.layton@xxxxxxxxxxxxxxx> wrote:
> > v3:
> > - break out a ceph locking cleanup patch into a separate one earlier
> >   in the series
> > - switch back to using the i_lock to protect assignment of i_flctx.
> >   Using cmpxchg() was subject to races that I couldn't quite get a
> >   grip on. Eventually I may switch it back, but it probably doesn't
> >   provide much benefit.
> > - add a patch to clean up some comments that refer to i_flock
> > - use list_empty_careful in lockless checks rather than list_empty
> >
> > v2:
> > - bugfix to the flc_posix list ordering that broke the split/merge code
> > - don't use i_lock to manage the i_flctx pointer. Do so locklessly
> >   via cmpxchg().
> > - reordering of the patches to make the set bisectable. As a result
> >   the new spinlock is not introduced until near the end of the set
> > - some cleanup of the lm_change operation was added, made possible
> >   by the move to standard list_heads for tracking locks
> > - it takes greater pains to avoid spinlocking by checking when the
> >   lists are empty in addition to whether the i_flctx pointer is NULL
> > - a patch was added to keep count of the number of locks, so we can
> >   avoid having to do count/alloc/populate in ceph and cifs
> >
> > This is the third iteration of this patchset. The second was posted
> > on January 8th, under the cover letter entitled:
> >
> >     [PATCH v2 00/10] locks: saner method for managing file locks
> >
> > The big change here is that it once again uses the i_lock to protect the
> > i_flctx pointer assignment. In principle we should be able to use
> > cmpxchg instead, but that seems leave open a race that causes
> > locks_remove_file to miss recently-added locks. Given that is not a
> > super latency-sensitive codepath, an extra spinlock shouldn't make much
> > difference.
> >
> > Many thanks to Sasha Levin who helped me nail a race that would leave
> > leftover locks on the i_flock list, and David Howells who convinced
> > me to just go ahead back to using the i_lock to protect that field.
> >
> > There are some other minor changes, but overall it's pretty similar
> > to the last set. I still plan to merge this for v3.20.
> >
> > ------------------------[snip]-------------------------
> >
> > We currently manage all file_locks via a singly-linked list. This is
> > problematic for a number of reasons:
> >
> > - we have to protect all file locks with the same spinlock (or
> >   equivalent). Currently that uses the i_lock, but Christoph has voiced
> >   objections due to the potential for contention with other i_lock
> >   users. He'd like to see us move to using a different lock.
> >
> > - we have to walk through irrelevant file locks in order to get to the
> >   ones we're interested in. For instance, POSIX locks are at the end
> >   of the list, so we have to skip over all of the flock locks and
> >   leases before we can work with them.
> >
> > - the singly-linked list is primitive and difficult to work with. We
> >   have to keep track of the "before" pointer and it's easy to get that
> >   wrong.
> >
> > Cleaning all of this up is complicated by the fact that no one really
> > wants to grow struct inode in order to do so. We have a single pointer
> > in the inode now and I don't think we want to use any more.
> >
> > One possibility that Trond raised was to move this to an hlist, but
> > that doesn't do anything about the desire for a new spinlock.
> >
> > This patchset takes the approach of replacing the i_flock list with a
> > new struct file_lock_context that is allocated when we intend to add a
> > new file lock to an inode. The file_lock_context is only freed when we
> > destroy the inode.
> >
> > Within that, we have separate (and standard!) lists for each lock type,
> > and a dedicated spinlock for managing those lists. In principle we could
> > even consider adding separate locks for each, but I didn't bother with
> > that for now.
> >
> > For now, the code is still pretty "raw" and isn't bisectable. This is
> > just a RFC for the basic approach. This is probably v3.19 material at
> > best.
> >
> > Anyone have thoughts or comments on the basic approach?
> >
> > Jeff Layton (13):
> >   locks: add new struct list_head to struct file_lock
> >   locks: have locks_release_file use flock_lock_file to release generic
> >     flock locks
> >   locks: add a new struct file_locking_context pointer to struct inode
> >   ceph: move spinlocking into ceph_encode_locks_to_buffer and
> >     ceph_count_locks
> >   locks: move flock locks to file_lock_context
> >   locks: convert posix locks to file_lock_context
> >   locks: convert lease handling to file_lock_context
> >   locks: remove i_flock field from struct inode
> >   locks: add a dedicated spinlock to protect i_flctx lists
> >   locks: clean up the lm_change prototype
> >   locks: keep a count of locks on the flctx lists
> >   locks: consolidate NULL i_flctx checks in locks_remove_file
> >   locks: update comments that refer to inode->i_flock
> >
> >  fs/ceph/locks.c      |  64 +++---
> >  fs/ceph/mds_client.c |   4 -
> >  fs/cifs/file.c       |  34 +--
> >  fs/inode.c           |   3 +-
> >  fs/lockd/svcsubs.c   |  26 ++-
> >  fs/locks.c           | 569 +++++++++++++++++++++++++++------------------------
> >  fs/nfs/delegation.c  |  23 ++-
> >  fs/nfs/nfs4state.c   |  70 ++++---
> >  fs/nfs/pagelist.c    |   6 +-
> >  fs/nfs/write.c       |  41 +++-
> >  fs/nfsd/nfs4state.c  |  21 +-
> >  fs/read_write.c      |   2 +-
> >  include/linux/fs.h   |  52 +++--
> >  13 files changed, 510 insertions(+), 405 deletions(-)
> >
> > --
> > 2.1.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux