----- On Feb 25, 2022, at 12:56 PM, Mathieu Desnoyers mathieu.desnoyers@xxxxxxxxxxxx wrote: > ----- On Feb 25, 2022, at 12:35 PM, Jonathan Corbet corbet@xxxxxxx wrote: > >> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> writes: >> >>> This feature allows the scheduler to expose a current virtual cpu id >>> to user-space. This virtual cpu id is within the possible cpus range, >>> and is temporarily (and uniquely) assigned while threads are actively >>> running within a memory space. If a memory space has fewer threads than >>> cores, or is limited to run on few cores concurrently through sched >>> affinity or cgroup cpusets, the virtual cpu ids will be values close >>> to 0, thus allowing efficient use of user-space memory for per-cpu >>> data structures. >> >> So I have one possibly (probably) dumb question: if I'm writing a >> program to make use of virtual CPU IDs, how do I know what the maximum >> ID will be? It seems like one of the advantages of this mechanism would >> be not having to be prepared for anything in the physical ID space, but >> is there any guarantee that the virtual-ID space will be smaller? >> Something like "no larger than the number of threads", say? > > Hi Jonathan, > > This is a very relevant question. Let me quote what I answered to Florian > on the last round of review for this series: > > Some effective upper bounds for the number of vcpu ids observable in a process: > > - sysconf(3) _SC_NPROCESSORS_CONF, > - the number of threads which exist concurrently in the process, One small detail I forgot to mention: on a NUMA system, a single-threaded process will observe (typically) vcpu_id=numa_node_id. So it can jump around between vcpu_id values depending on which numa node it runs on at the moment. So the vcpu_id is not strictly bound by the number of concurrently running threads. Thanks, Mathieu > - the number of cpus in the cpu affinity mask applied by sched_setaffinity, > except in corner-case situations such as cpu hotplug removing all cpus from > the affinity set, > - cgroup cpuset "partition" limits, > > Note that AFAIR non-partition cgroup cpusets allow a cgroup to "borrow" > additional cores from the rest of the system if they are idle, therefore > allowing the number of concurrent threads to go beyond the specified limit. > > AFAIR the sched affinity mask is tweaked independently of the cgroup cpuset. > Those are two mechanisms both affecting the scheduler task placement. > > I would expect the user-space code to use some sensible upper bound as a > hint about how many per-vcpu data structure elements to expect (and how many > to pre-allocate), but have a "lazy initialization" fall-back in case the > vcpu id goes up to the number of configured processors - 1. And I suspect > that even the number of configured processors may change with CRIU. > > If the above explanation makes sense (please let me know if I am wrong > or missed something), I suspect I should add it to the commit message. > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com