Re: [PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> · Thu, 09 Nov 2023 23:26:23 -0600

Mateusz Guzik <mjguzik@xxxxxxxxx> writes:

> On 11/9/23, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>> Mateusz Guzik <mjguzik@xxxxxxxxx> writes:
>>> sched_exec causes migration only for only few % of execs in the bench,
>>> but when it does happen there is tons of overhead elsewhere.
>>>
>>> I expect real programs which get past execve will be prone to
>>> migrating anyway, regardless of what sched_exec is doing.
>>>
>>> That is to say, while sched_exec buggering off here would be nice, I
>>> think for real-world wins the thing to investigate is the overhead
>>> which comes from migration to begin with.
>>
>> I have a vague memory that the idea is that there is a point during exec
>> when it should be much less expensive than normal to allow migration
>> between cpus because all of the old state has gone away.
>>
>> Assuming that is the rationale, if we are getting lock contention
>> then either there is a global lock in there, or there is the potential
>> to pick a less expensive location within exec.
>>
>
> Given the commit below I think the term "migration cost" is overloaded here.
>
> By migration cost in my previous mail I meant the immediate cost
> (stop_one_cpu and so on), but also the aftermath -- for example tlb
> flushes on another CPU when tearing down your now-defunct mm after you
> switched.
>
> For testing purposes I verified commenting out sched_exec and not
> using taskset still gives me about 9.5k ops/s.
>
> I 100% agree should the task be moved between NUMA domains, it makes
> sense to do it when it has the smallest footprint. I don't know what
> the original patch did, the current code just picks a CPU and migrates
> to it, regardless of NUMA considerations. I will note that the goal
> would still be achieved by comparing domains and doing nothing if they
> match.
>
> I think this would be nice to fix, but it is definitely not a big
> deal. I guess the question is to Peter Zijlstra if this sounds
> reasonable.

Perhaps I misread the trace. My point was simply that the sched_exec
seemed to be causing lock contention because what was on one cpu is
now on another cpu, and we are now getting cross cpu lock ping-pongs.

If the sched_exec is causing exec to cause cross cpu lock ping-pongs,
then we can move sched_exec to a better place within exec.  It has
already happened once, shortly after it was introduced.

Ultimately we want the sched_exec to be in the cheapest place within
exec that we can find.

Eric