Re: [PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> · Wed, 08 Nov 2023 18:17:53 -0600

Mateusz Guzik <mjguzik@xxxxxxxxx> writes:

> On 11/8/23, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>> On Wed, Nov 08, 2023 at 01:03:33AM +0100, Mateusz Guzik wrote:
>>> I'm getting around 3.4k execs/s. However, if I "taskset -c 3
>>> ./static-doexec 1" the number goes up to about 9.5k and lock
>>> contention disappears from the profile. So off hand looks like the
>>> task is walking around the box when it perhaps could be avoided -- it
>>> is idle apart from running the test. Again this is going to require a
>>> serious look instead of ad hoc pokes.
>>
>> Peter, is this something you can speak to? It seems like execve() forces
>> a change in running CPU. Is this really something we want to be doing?
>> Or is there some better way to keep it on the same CPU unless there is
>> contention?
>>
>
> sched_exec causes migration only for only few % of execs in the bench,
> but when it does happen there is tons of overhead elsewhere.
>
> I expect real programs which get past execve will be prone to
> migrating anyway, regardless of what sched_exec is doing.
>
> That is to say, while sched_exec buggering off here would be nice, I
> think for real-world wins the thing to investigate is the overhead
> which comes from migration to begin with.

I have a vague memory that the idea is that there is a point during exec
when it should be much less expensive than normal to allow migration
between cpus because all of the old state has gone away.

Assuming that is the rationale, if we are getting lock contention
then either there is a global lock in there, or there is the potential
to pick a less expensive location within exec.

Just to confirm my memory I dug a little deeper and I found the original
commit that added sched_exec (in tglx's git tree of the bit keeper
history).

commit f01419fd6d4e5b32fef19d206bc3550cc04567a9
Author: Martin J. Bligh <mbligh@xxxxxxxxxxx>
Date:   Wed Jan 15 19:46:10 2003 -0800

    [PATCH] (2/3) Initial load balancing

    Patch from Michael Hohnbaum

    This adds a hook, sched_balance_exec(), to the exec code, to make it
    place the exec'ed task on the least loaded queue. We have less state
    to move at exec time than fork time, so this is the cheapest point
    to cross-node migrate. Experience in Dynix/PTX and testing on Linux
    has confirmed that this is the cheapest time to move tasks between nodes.

    It also macro-wraps changes to nr_running, to allow us to keep track of
    per-node nr_running as well. Again, no impact on non-NUMA machines.

Eric