On Tue, Aug 02, 2016 at 04:01:22PM -0400, Jeff Layton wrote: > On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote: > > On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote: > > > > > > > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes: > > > > > > > > > > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote: > > > > > > > > > > > > > > Nikolay Borisov <kernel@xxxxxxxx> writes: > > > > > > > > > > > > > > > > > Currently when /proc/locks is read it will show all the file locks > > > > > > which are currently created on the machine. On containers, hosted > > > > > > on busy servers this means that doing lsof can be very slow. I > > > > > > observed up to 5 seconds stalls reading 50k locks, while the container > > > > > > itself had only a small number of relevant entries. Fix it by > > > > > > filtering the locks listed by the pidns of the current process > > > > > > and the process which created the lock. > > > > > > > > > > The locks always confuse me so I am not 100% connecting locks > > > > > to a pid namespace is appropriate. > > > > > > > > > > That said if you are going to filter by pid namespace please use the pid > > > > > namespace of proc, not the pid namespace of the process reading the > > > > > file. > > > > > > > > Oh, that makes sense, thanks. > > > > > > > > What does /proc/mounts use, out of curiosity? The mount namespace that > > > > /proc was originally mounted in? > > > > > > /proc/mounts -> /proc/self/mounts > > > > D'oh, I knew that. > > > > > > > > /proc/[pid]/mounts lists mounts from the mount namespace of the > > > appropriate process. > > > > > > That is another way to go but it is a tread carefully thing as changing > > > things that way it is easy to surprise apparmor or selinux rules and be > > > surprised you broke someones userspace in a way that prevents booting. > > > Although I suspect /proc/locks isn't too bad. > > > > OK, thanks. > > > > /proc/[pid]/locks might be confusing. I'd expect it to be "all the > > locks owned by this task", rather than "all the locks owned by pid's in > > the same pid namespace", or whatever criterion we choose. > > > > Uh, I'm still trying to think of the Obviously Right solution here, and > > it's not coming. > > > > --b. > > > I'm a little leery of changing how this works. It has always been > maintained as a legacy interface, so do we run the risk of breaking > something if we turn it into a per-namespace thing? The namespace work is all about making interfaces per-namespace. I guess it works as long as it contributes to the illusion that each container is its own machine. Thinking about it, I might be sold on the per-pidns approach (with Eric's modification to use the pidns of /proc not the reader). My complaint about not being able to see conflicting locks would apply just as well to conflicts from nfs locks held by other clients. A disk filesystem shared across multiple containers is a little like an nfs filesystem shared between nfs clients. That'd solve this immediate problem without requiring an lsof upgrade as well. > This also doesn't > solve the problem of slow traversal in the init_pid_ns -- only in a > container. > > I also can't help but feel that /proc/locks is just showing its age. It > was fine in the late 90's, but its limitations are just becoming more > apparent as things get more complex. It was never designed for > performance as you end up thrashing several spinlocks when reading it. > > Maybe it's time to think about presenting this info in another way? A > global view of all locks on the system is interesting but maybe it > would be better to present it more granularly somehow? But, yes, that might be a good idea. --b. > > I guess I should go look at what lsof actually does with this info... > > -- > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html