Re: Soft-lockup on vfsmount_lock with large numbers of mount namespaces in the cloud

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Tue, 25 Feb 2014 00:05:47 -0800

Dave Chiluk <chiluk@xxxxxxxxxxxxx> writes:

> An openstack neutron gateway uses network namespaces to partition
> machines within a cloud. In order to do so it creates lots of network
> namespaces, and as a result mount namespaces. This is accomplished
> through many calls to
>
> $ ip netns add/delete/exec
>
> After roughly 3k-4k namespaces the performance of these ip calls becomes
> very slow on the order of many seconds.  After a few more the machine
> starts to report "BUGs" on the stuck ip processes (BUG output below).
>
> We think the problem is contention for the vfsmount_lock which gets held
> by do_umount while it walks the mounts in the following stack
>
> do_umount
>  -> umount_tree
>     -> propagate_umount
>        -> __propagate_umount
>           -> __lookup_mnt
>
> Where lookup_mnt proceeds to spend significant time walking the
> mount_hastable.
>
> How we can mitigate or fix this expensive operation while holding the
> lock?  If this has already been fixed please feel free to point me at
> requisite git hash's.

Just looking the expensive operation appears to be mount/umount
propagation.  I expect there is some mount propogating to all 4k mount
namespaces you have, and that is taking the time.

You should be able to dig into the set of mounts on your system, and
figure out which umount is propogating to understand what is going on.

After that you can either modify userspace to remove the mount
propagation (perhaps just a patch to iproute) or we can figure out how
to improve the locking present when the kernel propogates mounts.

> Perhaps I'm looking in the wrong area of code, and I really just need
> aa7a574d0c54cc5a0aceb7357b5097342c0844ee.  Are there any others that
> immediately stand out or is this a new problem?

I think people actually using mount/umount propagation on a large scale
is new.

> Also we've tried reproducing with 3.5, 3.8, 3.11 which yielded similar
> results. 3.13 runs into similar results but has different issues related
> to the RCU locking.  When I have a better idea as to what's going on
> with 3.13 I will report back about that.

>From an upstream perspective I am primarily interested in 3.13 and
3.14-rcX.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html