Mateusz Guzik <mjguzik@xxxxxxxxx> writes: > On 11/9/23, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >> Mateusz Guzik <mjguzik@xxxxxxxxx> writes: >>> sched_exec causes migration only for only few % of execs in the bench, >>> but when it does happen there is tons of overhead elsewhere. >>> >>> I expect real programs which get past execve will be prone to >>> migrating anyway, regardless of what sched_exec is doing. >>> >>> That is to say, while sched_exec buggering off here would be nice, I >>> think for real-world wins the thing to investigate is the overhead >>> which comes from migration to begin with. >> >> I have a vague memory that the idea is that there is a point during exec >> when it should be much less expensive than normal to allow migration >> between cpus because all of the old state has gone away. >> >> Assuming that is the rationale, if we are getting lock contention >> then either there is a global lock in there, or there is the potential >> to pick a less expensive location within exec. >> > > Given the commit below I think the term "migration cost" is overloaded here. > > By migration cost in my previous mail I meant the immediate cost > (stop_one_cpu and so on), but also the aftermath -- for example tlb > flushes on another CPU when tearing down your now-defunct mm after you > switched. > > For testing purposes I verified commenting out sched_exec and not > using taskset still gives me about 9.5k ops/s. > > I 100% agree should the task be moved between NUMA domains, it makes > sense to do it when it has the smallest footprint. I don't know what > the original patch did, the current code just picks a CPU and migrates > to it, regardless of NUMA considerations. I will note that the goal > would still be achieved by comparing domains and doing nothing if they > match. > > I think this would be nice to fix, but it is definitely not a big > deal. I guess the question is to Peter Zijlstra if this sounds > reasonable. Perhaps I misread the trace. My point was simply that the sched_exec seemed to be causing lock contention because what was on one cpu is now on another cpu, and we are now getting cross cpu lock ping-pongs. If the sched_exec is causing exec to cause cross cpu lock ping-pongs, then we can move sched_exec to a better place within exec. It has already happened once, shortly after it was introduced. Ultimately we want the sched_exec to be in the cheapest place within exec that we can find. Eric