Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2022-01-28 at 21:40 +0000, Sean Christopherson wrote:
> Nope.  You missed a spot.  This also reproduces on a sufficiently large Intel
> system (and Milan).  initial_gs gets overwritten by common_cpu_up(), which leads
> to a CPU getting the wrong MSR_GS_BASE and then the wrong raw_smp_processor_id(),
> resulting in cpu_init_exception_handling() stuffing the wrong GDT and leaving a
> NULL TR descriptor for itself.
> 
> You also have a lurking bug in the x2APIC ID handling.  Stripping the boot flags
> from the prescribed APICID needs to happen before retrieving the x2APIC ID from
> CPUID, otherwise bits 31:16 of the ID will be lost.
> 
> You owe me two beers ;-)
> 
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index dcdf49a137d6..23df88c86a0e 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -208,11 +208,14 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>          * in smpboot_control:
>          * Bit 0-15     APICID if STARTUP_USE_CPUID_0B is not set
>          * Bit 16       Secondary boot flag
> -        * Bit 17       Parallel boot flag
> +        * Bit 17       Parallel boot flag (STARTUP_USE_CPUID_0B)
>          */
>         testl   $STARTUP_USE_CPUID_0B, %eax
> -       jz      .Lsetup_AP
> +       jnz     .Luse_cpuid_0b
> +       andl    $0xFFFF, %eax
> +       jmp     .Lsetup_AP
> 
> +.Luse_cpuid_0b:
>         mov     $0x0B, %eax
>         xorl    %ecx, %ecx
>         cpuid

Looks like I had already fixed that one in a cleanup at
https://git.infradead.org/users/dwmw2/linux.git/commitdiff/191f08997577

I removed the mask entirely. We now use the APIC ID from the low 31
bits if bit 31 isn't set... and there's no need to mask it out because
by definition it isn't set.

+       /*
+        * Secondary CPUs find out the offsets via the APIC ID. For parallel
+        * boot the APIC ID is retrieved from CPUID, otherwise it's encoded
+        * in smpboot_control:
+        * Bit 0-30     APIC ID if STARTUP_PARALLEL is not set
+        * Bit 31       Parallel boot flag (use CPUID leaf 0x0b for APIC ID).
+        */
+       testl   $STARTUP_PARALLEL, %eax
+       jz      .Lsetup_AP
+
+       mov     $0x0B, %eax
+       xorl    %ecx, %ecx
+       cpuid
+       mov     %edx, %eax
+
+.Lsetup_AP:


I am, of course, still prepared to buy you as many beers as you desire.
Perhaps in Dublin in September, where we're (hopefully) going to be
doing Linux Plumbers Conference in person again at last!


(I actually think I'm going to rework that cleanup because it's given
us a hard-coded assumption that no AP has APIC ID 0. I'll put back the
explicit STARTUP_SECONDARY flag that Thomas had, and work your fix in
too to avoid re-introducing the bug.)

>  int common_cpu_up(unsigned int cpu, struct task_struct *idle)
>  {
>         int ret;
> @@ -1112,7 +1123,8 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
>         /* Stack for startup_32 can be just as for start_secondary onwards */
>         per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
>  #else
> -       initial_gs = per_cpu_offset(cpu);
> +       if (!do_parallel_bringup)
> +               initial_gs = per_cpu_offset(cpu);
>  #endif
>         return 0;
>  }

Hm, I think that can be removed completely, can't it? We don't need to
make it conditional, because even the non-parallel 64-bit bringup will
still take the same path in head_64.S to *find* the stack and other
per-CPU information; it just gets its APIC ID from the global variable
in order to do so.


Attachment: smime.p7s
Description: S/MIME cryptographic signature


[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux