Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/11/2024 4:43 PM, Zhao Liu wrote:
Hi Xiaoyao,

On Wed, Jan 10, 2024 at 05:31:28PM +0800, Xiaoyao Li wrote:
Date: Wed, 10 Jan 2024 17:31:28 +0800
From: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
  topo in CPUID[4]

On 1/8/2024 4:27 PM, Zhao Liu wrote:
From: Zhao Liu <zhao1.liu@xxxxxxxxx>

Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
nearest power-of-2 integer.

The nearest power-of-2 integer can be calculated by pow2ceil() or by
using APIC ID offset (like L3 topology using 1 << die_offset [3]).

But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
are associated with APIC ID. For example, in linux kernel, the field
"num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID.

And for
another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
matched with actual core numbers and it's calculated by:
"(1 << (pkg_offset - core_offset)) - 1".

could you elaborate it more? what is the value of actual core numbers on
Alder lake P? and what is the pkg_offset and core_offset?

For example, the following's the CPUID dump of an ADL-S machine:

CPUID.04H:

0x00000004 0x00: eax=0xfc004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
0x00000004 0x01: eax=0xfc004122 ebx=0x01c0003f ecx=0x0000007f edx=0x00000000
0x00000004 0x02: eax=0xfc01c143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
0x00000004 0x03: eax=0xfc1fc163 ebx=0x0240003f ecx=0x00009fff edx=0x00000004
0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000


CPUID.1FH:

0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x0000004c
0x0000001f 0x01: eax=0x00000007 ebx=0x00000014 ecx=0x00000201 edx=0x0000004c
0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x0000004c

The CPUID.04H:EAX[bits 31:26] is 63.
 From CPUID.1FH.00H:EAX[bits 04:00], the core_offset is 1, and from
CPUID.1FH.01H:EAX[bits 04:00], the pkg_offset is 7.

Thus we can verify that the above equation as:

1 << (0x7 - 0x1) - 1 = 63.

"Maximum number of addressable IDs" refers to the maximum number of IDs
that can be enumerated in the APIC ID's topology layout, which does not
necessarily correspond to the actual number of topology domains.


Therefore the offset of APIC ID should be preferred to calculate nearest
power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
31:26]:
1. d/i cache is shared in a core, 1 << core_offset should be used
     instand of "cs->nr_threads" in encode_cache_cpuid4() for

/s/instand/instead

Thanks!


     CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
2. L2 cache is supposed to be shared in a core as for now, thereby
     1 << core_offset should also be used instand of "cs->nr_threads" in

ditto

Okay.


     encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
     calculated with the bit width between the Package and SMT levels in
     the APIC ID (1 << (pkg_offset - core_offset) - 1).

In addition, use APIC ID offset to replace "pow2ceil()" for
cache_info_passthrough case.

[1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
[2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
[3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support")

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx>
Signed-off-by: Zhao Liu <zhao1.liu@xxxxxxxxx>
Tested-by: Babu Moger <babu.moger@xxxxxxx>
Tested-by: Yongwei Ma <yongwei.ma@xxxxxxxxx>
Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
---
Changes since v3:
   * Fix compile warnings. (Babu)
   * Fix spelling typo.

Changes since v1:
   * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
     case. (Yanan)
   * Split the L1 cache fix into a separate patch.
   * Rename the title of this patch (the original is "i386/cpu: Fix number
     of addressable IDs in CPUID.04H").
---
   target/i386/cpu.c | 30 +++++++++++++++++++++++-------
   1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5a3678a789cf..c8d2a585723a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6014,7 +6014,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
   {
       X86CPU *cpu = env_archcpu(env);
       CPUState *cs = env_cpu(env);
-    uint32_t die_offset;
       uint32_t limit;
       uint32_t signature[3];
       X86CPUTopoInfo topo_info;
@@ -6098,39 +6097,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                   int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
                   int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
                   if (cs->nr_cores > 1) {
+                    int addressable_cores_offset =
+                                                apicid_pkg_offset(&topo_info) -
+                                                apicid_core_offset(&topo_info);
+
                       *eax &= ~0xFC000000;
-                    *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
+                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;

it should be ((1 << addressable_cores_offset) - 1) << 26

Good catch! The helper wrapped in a subsequent patch masks the error here.


I think naming it addressable_cores_width is better than
addressable_cores_offset. It's not offset because offset means the bit
position from bit 0.

I agree, "width" is better.


And we can get the width by another algorithm:

int addressable_cores_width = apicid_core_width(&topo_info) +
apicid_die_width(&topo_info);
*eax |= ((1 << addressable_cores_width) - 1)) << 26;

This algorithm lacks flexibility because there will be more topology
levels between package and core, such as the cluster being introduced...

Using "addressable_cores_width" is clear enough.

		
                   }
                   if (host_vcpus_per_cache > vcpus_per_socket) {
+                    int pkg_offset = apicid_pkg_offset(&topo_info);
+
                       *eax &= ~0x3FFC000;
-                    *eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
+                    *eax |= (1 << (pkg_offset - 1)) << 14;

Ditto, ((1 << pkg_offset) - 1) << 14

Thanks!


For this one, I think pow2ceil(vcpus_per_socket) is better. Because it's
intuitive that when host_vcpus_per_cache > vcpus_per_socket, we expose
vcpus_per_cache (configured by users) to VM.

I tend to use a uniform calculation that is less confusing and easier to
maintain.

less confusing?

the original code is

	if (host_vcpus_per_cache > vcpus_per_socket) {
		*eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
	}

and this patch is going to change it to

	if (host_vcpus_per_cache > vcpus_per_socket) {
		int pkg_offset = apicid_pkg_offset(&topo_info);
		*eax |= (1 << pkg_offset - 1)) << 14;
	}

Apparently, the former is clearer that everyone knows what is wants to do is "when guest's total vcpus_per_socket is even smaller than host's vcpu_per_cache, using guest's configuration". While the latter is more confusing.

Since this field encodes "Maximum number of addressable IDs",
OS can't get the exact number of CPUs/vCPUs sharing L3 from here, it can
only know that L3 is shared at the package level.

It doesn't matter with L3. What the code want to fulfill is that,

host_vcpus_per_cache is the actual number of LPs that share this level of cache. While vcpus_per_socket is the maximum numbere of LPs that can share a cache (at any level) in guest. When guest's maximum number is even smaller than host's, use guest's value.

Thanks,
Zhao






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux