Re: [RFC PATCH v2 09/11] sched: Introduce per memory space current virtual cpu id

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Fri, 25 Feb 2022 16:21:02 -0500 (EST)

----- On Feb 25, 2022, at 12:56 PM, Mathieu Desnoyers mathieu.desnoyers@xxxxxxxxxxxx wrote:

> ----- On Feb 25, 2022, at 12:35 PM, Jonathan Corbet corbet@xxxxxxx wrote:
> 
>> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> writes:
>> 
>>> This feature allows the scheduler to expose a current virtual cpu id
>>> to user-space. This virtual cpu id is within the possible cpus range,
>>> and is temporarily (and uniquely) assigned while threads are actively
>>> running within a memory space. If a memory space has fewer threads than
>>> cores, or is limited to run on few cores concurrently through sched
>>> affinity or cgroup cpusets, the virtual cpu ids will be values close
>>> to 0, thus allowing efficient use of user-space memory for per-cpu
>>> data structures.
>> 
>> So I have one possibly (probably) dumb question: if I'm writing a
>> program to make use of virtual CPU IDs, how do I know what the maximum
>> ID will be?  It seems like one of the advantages of this mechanism would
>> be not having to be prepared for anything in the physical ID space, but
>> is there any guarantee that the virtual-ID space will be smaller?
>> Something like "no larger than the number of threads", say?
> 
> Hi Jonathan,
> 
> This is a very relevant question. Let me quote what I answered to Florian
> on the last round of review for this series:
> 
> Some effective upper bounds for the number of vcpu ids observable in a process:
> 
> - sysconf(3) _SC_NPROCESSORS_CONF,
> - the number of threads which exist concurrently in the process,

One small detail I forgot to mention: on a NUMA system, a single-threaded
process will observe (typically) vcpu_id=numa_node_id. So it can jump around
between vcpu_id values depending on which numa node it runs on at the moment.

So the vcpu_id is not strictly bound by the number of concurrently running
threads.

Thanks,

Mathieu

> - the number of cpus in the cpu affinity mask applied by sched_setaffinity,
>  except in corner-case situations such as cpu hotplug removing all cpus from
>  the affinity set,
> - cgroup cpuset "partition" limits,
> 
> Note that AFAIR non-partition cgroup cpusets allow a cgroup to "borrow"
> additional cores from the rest of the system if they are idle, therefore
> allowing the number of concurrent threads to go beyond the specified limit.
> 
> AFAIR the sched affinity mask is tweaked independently of the cgroup cpuset.
> Those are two mechanisms both affecting the scheduler task placement.
> 
> I would expect the user-space code to use some sensible upper bound as a
> hint about how many per-vcpu data structure elements to expect (and how many
> to pre-allocate), but have a "lazy initialization" fall-back in case the
> vcpu id goes up to the number of configured processors - 1. And I suspect
> that even the number of configured processors may change with CRIU.
> 
> If the above explanation makes sense (please let me know if I am wrong
> or missed something), I suspect I should add it to the commit message.
> 
> Thanks,
> 
> Mathieu
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com