Hi Daniel, On Thu, Feb 08, 2024 at 04:52:33PM +0000, Daniel P. Berrangé wrote: > Date: Thu, 8 Feb 2024 16:52:33 +0000 > From: "Daniel P. Berrangé" <berrange@xxxxxxxxxx> > Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU > > On Fri, Feb 02, 2024 at 12:10:58AM +0800, Zhao Liu wrote: > > Hi Daniel, > > > > On Thu, Feb 01, 2024 at 09:21:48AM +0000, Daniel P. Berrangé wrote: > > > Date: Thu, 1 Feb 2024 09:21:48 +0000 > > > From: "Daniel P. Berrangé" <berrange@xxxxxxxxxx> > > > Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU > > > > > > On Thu, Feb 01, 2024 at 10:57:32AM +0800, Zhao Liu wrote: > > > > Hi Daniel, > > > > > > > > On Wed, Jan 31, 2024 at 10:28:42AM +0000, Daniel P. Berrangé wrote: > > > > > Date: Wed, 31 Jan 2024 10:28:42 +0000 > > > > > From: "Daniel P. Berrangé" <berrange@xxxxxxxxxx> > > > > > Subject: Re: [PATCH v8 00/21] Introduce smp.modules for x86 in QEMU > > > > > > > > > > On Wed, Jan 31, 2024 at 06:13:29PM +0800, Zhao Liu wrote: > > > > > > From: Zhao Liu <zhao1.liu@xxxxxxxxx> > > > > > > > > [snip] > > > > > > > > > > However, after digging deeper into the description and use cases of > > > > > > cluster in the device tree [3], I realized that the essential > > > > > > difference between clusters and modules is that cluster is an extremely > > > > > > abstract concept: > > > > > > * Cluster supports nesting though currently QEMU doesn't support > > > > > > nested cluster topology. However, modules will not support nesting. > > > > > > * Also due to nesting, there is great flexibility in sharing resources > > > > > > on clusters, rather than narrowing cluster down to sharing L2 (and > > > > > > L3 tags) as the lowest topology level that contains cores. > > > > > > * Flexible nesting of cluster allows it to correspond to any level > > > > > > between the x86 package and core. > > > > > > > > > > > > Based on the above considerations, and in order to eliminate the naming > > > > > > confusion caused by the mapping between general cluster and x86 module > > > > > > in v7, we now formally introduce smp.modules as the new topology level. > > > > > > > > > > What is the Linux kernel calling this topology level on x86 ? > > > > > It will be pretty unfortunate if Linux and QEMU end up with > > > > > different names for the same topology level. > > > > > > > > > > > > > Now Intel's engineers in the Linux kernel are starting to use "module" > > > > to refer to this layer of topology [4] to avoid confusion, where > > > > previously the scheduler developers referred to the share L2 hierarchy > > > > collectively as "cluster". > > > > > > > > Looking at it this way, it makes more sense for QEMU to use the > > > > "module" for x86. > > > > > > I was thinking specificially about what Linux calls this topology when > > > exposing it in sysfs and /proc/cpuinfo. AFAICT, it looks like it is > > > called 'clusters' in this context, and so this is the terminology that > > > applications and users are going to expect. > > > > The cluster related topology information under "/sys/devices/system/cpu/ > > cpu*/topology" indicates the L2 cache topology (CPUID[0x4]), not module > > level CPU topology (CPUID[0x1f]). > > > > So far, kernel hasn't exposed module topology related sysfs. But we will > > add new "module" related information in sysfs. The relevant patches are > > ready internally, but not posted yet. > > > > In the future, we will use "module" in sysfs to indicate module level CPU > > topology, and "cluster" will be only used to refer to the l2 cache domain > > as it is now. > > So, if they're distinct concepts both relevant to x86 CPUs, then from > the QEMU POV, should this patch series be changing the -smp arg to > allowing configuration of both 'clusters' and 'modules' for x86 ? Though the previous versions use "clusters" parameter, they, like the current "modules" version, are just to add a CPU topology level to the x86 CPU. > > An earlier version of this series just supported 'clusters', and this > changed to 'modules', but your description of Linux reporting both > suggests QEMU would need both. > Given the cluster support for x86, i.e. the L2 cache topology support, we want to introduce a different cache topology configuration way than CPU topology and avoid using the "cluster" as cache topology name (this avoids the confusion of -smp "clusters" which is a CPU topology since ARM also just treats "cluster" as a CPU topology level in QEMU other than cache topology level). BTW, for cache topology, may I ask for your advice? Currently, I can think of 2 options: 1. Hacked the -smp as: -smp cpus=4,sockets=2,cores=2,threads=1, \ l3-cache=socket,l2-cache=core,l1-i-cache=core,l1-d-cache=core For this way, I just parsed the extended -smp and store the cache topology in such structue: typedef struct CacheTopology { CPUTopoLevel l1i; CPUTopoLevel l1d; CPUTopoLevel l2; CPUTopoLevel l3; } CacheTopology; This way is just used for smp cache topology. For the heterogeneous/hybrid cache topology, I think it can be expanded based on the QOM CPU topology [4] as: -accel kvm -cpu host \ -device cpu-socket,id=sock0 \ -device cpu-die,id=die0,parent=sock0 \ -device cpu-module,id=module0,parent=die0 \ -device cpu-module,id=module1,parent=die0 \ -device cpu-core,id=core0,parent=module0,nr-threads=2 \ -device cpu-core,id=core1,parent=module1,nr-threads=1 \ -device cpu-core,id=core2,parent=module1,nr-threads=1 \ -device cache,id=cache0,parent=die0,level=3,type=unified \ -device cache,id=cache1,parent=core0,level=2,type=unified \ -device cache,id=cache2,parent=core0,level=1,type=data \ -device cache,id=cache3,parent=core0,level=1,type=inst \ -device cache,id=cache4,parent=module1,level=2,type=unified \ -device cache,id=cache5,parent=core1,level=1,type=data \ -device cache,id=cache6,parent=core1,level=1,type=inst \ -device cache,id=cache5,parent=core2,level=1,type=data \ -device cache,id=cache6,parent=core2,level=1,type=inst \ In the module0, the l2 (x86's cluster) is shared at core0 (core level). And in the module1, the l2 is shared for core1 and core 2 (at module level). [4]: https://lore.kernel.org/qemu-devel/20231130144203.2307629-1-zhao1.liu@xxxxxxxxxxxxxxx/ 2. But recently I realized maybe there's another option, which is just to introduce a new option "-cache" like "-numa" to configure cache topology. In "-cache", we could accept the CPU list as the parameter: -cache cache,cacheid=0,level=2,type=unified,cpus=0-1 \ -cache cache,cacheid=1,level=2,type=unified,cpus=2-3 \ or CPU topology ids as the parameters: -cache cache,cache-id=0,level=2,type=unified \ -cache cache,cache-id=1,level=2,type=unified \ -cache cpu,cache-id=0,socket-id=0,die-id=0,module-id=0,core-id=0 \ -cache cpu,cache-id=1,socket-id=0,die-id=0,module-id=1 \ Hmmm, Daniel, which of the above two options do you prefer? Thanks, Zhao