Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 02, 2016 at 04:01:22PM -0400, Jeff Layton wrote:
> On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote:
> > On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
> > > 
> > > > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes:
> > > 
> > > > 
> > > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> > > > > 
> > > > > > > > > Nikolay Borisov <kernel@xxxxxxxx> writes:
> > > > > 
> > > > > > 
> > > > > > Currently when /proc/locks is read it will show all the file locks
> > > > > > which are currently created on the machine. On containers, hosted
> > > > > > on busy servers this means that doing lsof can be very slow. I
> > > > > > observed up to 5 seconds stalls reading 50k locks, while the container
> > > > > > itself had only a small number of relevant entries. Fix it by
> > > > > > filtering the locks listed by the pidns of the current process
> > > > > > and the process which created the lock.
> > > > > 
> > > > > The locks always confuse me so I am not 100% connecting locks
> > > > > to a pid namespace is appropriate.
> > > > > 
> > > > > That said if you are going to filter by pid namespace please use the pid
> > > > > namespace of proc, not the pid namespace of the process reading the
> > > > > file.
> > > > 
> > > > Oh, that makes sense, thanks.
> > > > 
> > > > What does /proc/mounts use, out of curiosity?  The mount namespace that
> > > > /proc was originally mounted in?
> > > 
> > > /proc/mounts -> /proc/self/mounts
> > 
> > D'oh, I knew that.
> > 
> > > 
> > > /proc/[pid]/mounts lists mounts from the mount namespace of the
> > > appropriate process.
> > > 
> > > That is another way to go but it is a tread carefully thing as changing
> > > things that way it is easy to surprise apparmor or selinux rules and be
> > > surprised you broke someones userspace in a way that prevents booting.
> > > Although I suspect /proc/locks isn't too bad.
> > 
> > OK, thanks.
> > 
> > /proc/[pid]/locks might be confusing.  I'd expect it to be "all the
> > locks owned by this task", rather than "all the locks owned by pid's in
> > the same pid namespace", or whatever criterion we choose.
> > 
> > Uh, I'm still trying to think of the Obviously Right solution here, and
> > it's not coming.
> > 
> > --b.
> 
> 
> I'm a little leery of changing how this works. It has always been
> maintained as a legacy interface, so do we run the risk of breaking
> something if we turn it into a per-namespace thing?

The namespace work is all about making interfaces per-namespace.  I
guess it works as long as it contributes to the illusion that each
container is its own machine.

Thinking about it, I might be sold on the per-pidns approach (with
Eric's modification to use the pidns of /proc not the reader).

My complaint about not being able to see conflicting locks would apply
just as well to conflicts from nfs locks held by other clients.  A disk
filesystem shared across multiple containers is a little like an nfs
filesystem shared between nfs clients.

That'd solve this immediate problem without requiring an lsof upgrade as
well.

> This also doesn't
> solve the problem of slow traversal in the init_pid_ns -- only in a
> container.
> 
> I also can't help but feel that /proc/locks is just showing its age. It
> was fine in the late 90's, but its limitations are just becoming more
> apparent as things get more complex. It was never designed for
> performance as you end up thrashing several spinlocks when reading it.
> 
> Maybe it's time to think about presenting this info in another way? A
> global view of all locks on the system is interesting but maybe it
> would be better to present it more granularly somehow?

But, yes, that might be a good idea.

--b.

> 
> I guess I should go look at what lsof actually does with this info...
> 
> -- 
> Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux