Giuseppe Scrivano <gscrivan@xxxxxxxxxx> writes: > ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes: > >> Giuseppe Scrivano <gscrivan@xxxxxxxxxx> writes: >> >>> it avoids blocking on synchronize_rcu() in kern_umount(). >>> >>> the code: >>> >>> \#define _GNU_SOURCE >>> \#include <sched.h> >>> \#include <error.h> >>> \#include <errno.h> >>> \#include <stdlib.h> >>> int main() >>> { >>> int i; >>> for (i = 0; i < 1000; i++) >>> if (unshare (CLONE_NEWIPC) < 0) >>> error (EXIT_FAILURE, errno, "unshare"); >>> } >>> >>> gets from: >>> >>> Command being timed: "./ipc-namespace" >>> User time (seconds): 0.00 >>> System time (seconds): 0.06 >>> Percent of CPU this job got: 0% >>> Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.05 >>> >>> to: >>> >>> Command being timed: "./ipc-namespace" >>> User time (seconds): 0.00 >>> System time (seconds): 0.02 >>> Percent of CPU this job got: 96% >>> Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03 >> >> I have a question. You create 1000 namespaces in a single process >> and then free them. So I expect that single process is busy waiting >> for that kern_umount 1000 types, and waiting for 1000 synchronize_rcu's. >> >> Does this ever show up in a real world work-load? >> >> Is the cost of a single synchronize_rcu a problem? > > yes exactly, creating 1000 namespaces is not a real world use case (at > least in my experience) but I've used it only to show the impact of the > patch. I know running 1000 containers is a real use case, and I would not be surprised if their are configurations that go higher. > The cost of the single synchronize_rcu is the issue. > > Most containers run in their own IPC namespace, so this is a constant > cost for each container. Agreed. >> The code you are working to avoid is this. >> >> void kern_unmount(struct vfsmount *mnt) >> { >> /* release long term mount so mount point can be released */ >> if (!IS_ERR_OR_NULL(mnt)) { >> real_mount(mnt)->mnt_ns = NULL; >> synchronize_rcu(); /* yecchhh... */ >> mntput(mnt); >> } >> } >> >> Which makes me wonder if perhaps there might be a simpler solution >> involving just that code. But I do realize such a solution >> would require analyzing all of the code after kern_unmount >> to see if any of it depends upon the synchronize_rcu. >> >> >> In summary, I see no correctness problems with your code. >> Code that runs faster is always nice. In this case I just >> see the cost being shifted somewhere else not eliminated. >> I also see a slight increase in complexity. >> >> So I am wondering if this was an exercise to speed up a toy >> benchmark or if this is an effort to speed of real world code. > > I've seen the issue while profiling real world work loads. So the question is how to remove this delay. >> At the very least some version of the motivation needs to be >> recorded so that the next time some one comes in an reworks >> the code they can look in the history and figure out what >> they need to do to avoid introducing a regeression. > > Is it enough in the git commit message or should it be an inline > comment? The git commit message should be enough to record the motivation. A comment in the code that about the work queue that says something like "used to avoid the cost of synchronize_rcu in kern_unmount" would also be nice. Eric