[PATCH v2 2/2] x86, apic: Disable BSP if boot cpu is AP

d.hatayama@xxxxxxxxxxxxxx (HATAYAMA Daisuke) · Tue, 22 Oct 2013 20:02:38 +0900

(2013/10/19 2:36), Vivek Goyal wrote:
> On Wed, Oct 16, 2013 at 10:26:44AM +0900, HATAYAMA Daisuke wrote:
>
> [..]
>>> I am wondering if there is any attribute of cpu which we can pass to
>>> second kernel on command line. And tell second kernel not to bring up
>>> that specific cpu. (Say exclude_cpu=<cpu_attr>)? If this works, then
>>> if ACPI or other mechanism don't report BSP, we could possibly assume
>>> that cpu 0 is BSP and ask second kernel to not try to boot it.
>>>
>>
>> I've come up with similar idea. If there's such kernel option, rest of
>> the processing can be implemented in user-land, i.e., get apicid of
>> BSP from /proc/cpuid and set it in kernel command line of 2nd kernel.
>> What kexec-tools should do on fedora/RHEL? Also, this idea covers SFI
>> and device tree.
>>
>> The reason why I didn't choose such idea was first passing the value
>> via command-line seems rather ad-hoc.
>
> We do so many things using command line. So telling kernel not to boot
> certain cpus seems ok to me.
>
>> The second reason is that in any
>> case it's compromised design. Rigorously, we cannot get correct mapping
>> of apicid to {BSP, APIC} at the 1st kernel.  That is, there's a class of
>> the bugs that affect BSP flag of each processor. For example, on
>> catastrophic state, all the cpus can have BSP flag on the 2nd kernel due
>> to wrmsr instructions generated by the bug causing crash. In this sense,
>> current implementation is less reliable than max_cpus=1 case.
>>
>> If addressing this rigorously, for example, we need to check status of
>> BSP flag between 1st kernel and 2nd kernel to keep processor with BSP
>> flag unique, exclude cpus in catastrophic state that are not checked,
>> and to tell the 2nd kernel which cpu can be wake up.
>
> Ok, for the time being let us not do any comparision with maxcpus=1 or
> nr_cpus=1 because we know that's the most robust thing to do.
>
> For the case where we want to bring up more than one cpu in second kernel,
> there seems to be two problems.
>
> - ACPI tables or other tables might not report which is BSP. In that
>    case we might try to bring up BSP and crash the system.
>
> - Due to malicious wrmsr, more than one cpu might claim being BSP. In that
>    case the cpu we are crashing on will think it is BSP and it can safely
>    bring up other cpus.
>
> If we start sending a mask of cpus which should not be brought up in
> second kernel, then it would not matter whether BSP flag in MSR is set
> or not. Isn't it? And that will solve the second issue.
>

No. As long as the mask is created in the 1st kernel, mapping between CPUs
and {BSP, AP} could get changed at crash. So, the ``mask'' idea never
improves reliability.

To obtain complete reliability without any hardware support to get mapping
between all the CPUS and {BSP, AP}, we must create such mask after crash,
i.e., between the 1st and 2nd kernel such as purgatory or other new phase.
The idea is, for example, that let crashing AP wait for other CPUs in purgatory
until specified number of CPUs reach there or until a certain limit time passes
in case no other CPUs reach there in catastrophic state, and let even the other
CPUs except for the crashing AP go into purgatory, not halt just as the current
implementation, to let them check the mask to represent they can be safely
woken up in the 2nd kernel and then let them halt in the purgatory until they
or part of them are woken up from the 2nd kernel.

> And if ACPI tables don't report which one is BSP, user space can first
> try to look at BSP flags of processors (may be this can be reported
> in /proc/cpuinfo?) and if no one has BSP flag set, then assume cpu 0
> is BSP.
>
> So to me it looks like passing which cpus to not bring up to second kernel
> is more resilient approach. Isn't it?
>

Yes. Though reliability is similar to the current approach, user-space approach
is better in that it doesn't depend on what kind of  BIOS tables are present
in the system. Also, the idea is more general and could be applied to other
purposes, I don't know exactly what it is; disabling some part of CPUs might
be useful for the purpose of some kind of debugging?

I'll post new version later.

-- 
Thanks.
HATAYAMA, Daisuke