Hi Philippe, On Thu, Sep 14, 2023 at 09:38:46AM +0200, Philippe Mathieu-Daudé wrote: > Date: Thu, 14 Sep 2023 09:38:46 +0200 > From: Philippe Mathieu-Daudé <philmd@xxxxxxxxxx> > Subject: Re: [PATCH v4 10/21] i386: Introduce module-level cpu topology to > CPUX86State > > On 14/9/23 09:21, Zhao Liu wrote: > > From: Zhuocheng Ding <zhuocheng.ding@xxxxxxxxx> > > > > smp command has the "clusters" parameter but x86 hasn't supported that > > level. "cluster" is a CPU topology level concept above cores, in which > > the cores may share some resources (L2 cache or some others like L3 > > cache tags, depending on the Archs) [1][2]. For x86, the resource shared > > by cores at the cluster level is mainly the L2 cache. > > > > However, using cluster to define x86's L2 cache topology will cause the > > compatibility problem: > > > > Currently, x86 defaults that the L2 cache is shared in one core, which > > actually implies a default setting "cores per L2 cache is 1" and > > therefore implicitly defaults to having as many L2 caches as cores. > > > > For example (i386 PC machine): > > -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*) > > > > Considering the topology of the L2 cache, this (*) implicitly means "1 > > core per L2 cache" and "2 L2 caches per die". > > > > If we use cluster to configure L2 cache topology with the new default > > setting "clusters per L2 cache is 1", the above semantics will change > > to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2 > > cores per L2 cache". > > > > So the same command (*) will cause changes in the L2 cache topology, > > further affecting the performance of the virtual machine. > > > > Therefore, x86 should only treat cluster as a cpu topology level and > > avoid using it to change L2 cache by default for compatibility. > > > > "cluster" in smp is the CPU topology level which is between "core" and > > die. > > > > For x86, the "cluster" in smp is corresponding to the module level [2], > > which is above the core level. So use the "module" other than "cluster" > > in i386 code. > > > > And please note that x86 already has a cpu topology level also named > > "cluster" [3], this level is at the upper level of the package. Here, > > the cluster in x86 cpu topology is completely different from the > > "clusters" as the smp parameter. After the module level is introduced, > > the cluster as the smp parameter will actually refer to the module level > > of x86. > > > > [1]: 864c3b5c32f0 ("hw/core/machine: Introduce CPU cluster topology support") > > [2]: Yanan's comment about "cluster", > > https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html > > [3]: SDM, vol.3, ch.9, 9.9.1 Hierarchical Mapping of Shared Resources. > > > > Signed-off-by: Zhuocheng Ding <zhuocheng.ding@xxxxxxxxx> > > Co-developed-by: Zhao Liu <zhao1.liu@xxxxxxxxx> > > Signed-off-by: Zhao Liu <zhao1.liu@xxxxxxxxx> > > Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > > --- > > Changes since v1: > > * The background of the introduction of the "cluster" parameter and its > > exact meaning were revised according to Yanan's explanation. (Yanan) > > --- > > hw/i386/x86.c | 1 + > > target/i386/cpu.c | 1 + > > target/i386/cpu.h | 5 +++++ > > 3 files changed, 7 insertions(+) > > > > diff --git a/target/i386/cpu.h b/target/i386/cpu.h > > index 470257b92240..556e80f29764 100644 > > --- a/target/i386/cpu.h > > +++ b/target/i386/cpu.h > > @@ -1903,6 +1903,11 @@ typedef struct CPUArchState { > > /* Number of dies within this CPU package. */ > > unsigned nr_dies; > > + /* > > + * Number of modules within this CPU package. > > + * Module level in x86 cpu topology is corresponding to smp.clusters. > > + */ > > + unsigned nr_modules; > > } CPUX86State; > > It would be really useful to have an ASCII art comment showing > the architecture topology. Good idea, I could consider how to show that. > Also for clarity the topo fields from > CPU[Arch]State could be moved into a 'topo' sub structure, or even > clearer would be to re-use the X86CPUTopoIDs structure? Yeah, I also have the plan to do further cleanup about these topology structures [1]. X86CPUTopoInfo is not suitable for hybrid topology case, (hybrid case needs another structure to store the max number elements for each level), so I will try to move that X86CPUTopoInfo into CPU[Arch]State. [1]: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg01032.html Thanks, Zhao