Re: [PATCH v2] ipc: use a work queue to free_ipc

Giuseppe Scrivano <gscrivan@xxxxxxxxxx> · Sun, 23 Feb 2020 20:01:09 +0100

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:

> Giuseppe Scrivano <gscrivan@xxxxxxxxxx> writes:
>
>> it avoids blocking on synchronize_rcu() in kern_umount().
>>
>> the code:
>>
>> \#define _GNU_SOURCE
>> \#include <sched.h>
>> \#include <error.h>
>> \#include <errno.h>
>> \#include <stdlib.h>
>> int main()
>> {
>>   int i;
>>   for (i  = 0; i < 1000; i++)
>>     if (unshare (CLONE_NEWIPC) < 0)
>>       error (EXIT_FAILURE, errno, "unshare");
>> }
>>
>> gets from:
>>
>> 	Command being timed: "./ipc-namespace"
>> 	User time (seconds): 0.00
>> 	System time (seconds): 0.06
>> 	Percent of CPU this job got: 0%
>> 	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.05
>>
>> to:
>>
>> 	Command being timed: "./ipc-namespace"
>> 	User time (seconds): 0.00
>> 	System time (seconds): 0.02
>> 	Percent of CPU this job got: 96%
>> 	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
>
> I have a question.  You create 1000 namespaces in a single process
> and then free them.  So I expect that single process is busy waiting
> for that kern_umount 1000 types, and waiting for 1000 synchronize_rcu's.
>
> Does this ever show up in a real world work-load?
>
> Is the cost of a single synchronize_rcu a problem?

yes exactly, creating 1000 namespaces is not a real world use case (at
least in my experience) but I've used it only to show the impact of the
patch.

The cost of the single synchronize_rcu is the issue.

Most containers run in their own IPC namespace, so this is a constant
cost for each container.

> The code you are working to avoid is this.
>
> void kern_unmount(struct vfsmount *mnt)
> {
> 	/* release long term mount so mount point can be released */
> 	if (!IS_ERR_OR_NULL(mnt)) {
> 		real_mount(mnt)->mnt_ns = NULL;
> 		synchronize_rcu();	/* yecchhh... */
> 		mntput(mnt);
> 	}
> }
>
> Which makes me wonder if perhaps there might be a simpler solution
> involving just that code.  But I do realize such a solution
> would require analyzing all of the code after kern_unmount
> to see if any of it depends upon the synchronize_rcu.
>
>
> In summary, I see no correctness problems with your code.
> Code that runs faster is always nice.  In this case I just
> see the cost being shifted somewhere else not eliminated.
> I also see a slight increase in complexity.
>
> So I am wondering if this was an exercise to speed up a toy
> benchmark or if this is an effort to speed of real world code.

I've seen the issue while profiling real world work loads.

> At the very least some version of the motivation needs to be
> recorded so that the next time some one comes in an reworks
> the code they can look in the history and figure out what
> they need to do to avoid introducing a regeression.

Is it enough in the git commit message or should it be an inline
comment?

Thanks,
Giuseppe