Re: [kvm-unit-tests PATCH v3 06/18] arm/arm64: psci: Don't run C code without stack or vectors

Alexandru Elisei <alexandru.elisei@xxxxxxx> · Mon, 6 Jan 2020 14:03:25 +0000

Hi,

On 1/6/20 1:03 PM, Andrew Jones wrote:
> On Mon, Jan 06, 2020 at 11:02:16AM +0000, Alexandru Elisei wrote:
>> Hi,
>>
>> On 1/3/20 3:31 PM, Andrew Jones wrote:
>>> On Thu, Jan 02, 2020 at 06:11:21PM +0000, Andre Przywara wrote:
>>>> On Tue, 31 Dec 2019 16:09:37 +0000
>>>> Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote:
>>>>
>>>> Hi,
>>>>
>>>>> The psci test performs a series of CPU_ON/CPU_OFF cycles for CPU 1. This is
>>>>> done by setting the entry point for the CPU_ON call to the physical address
>>>>> of the C function cpu_psci_cpu_die.
>>>>>
>>>>> The compiler is well within its rights to use the stack when generating
>>>>> code for cpu_psci_cpu_die.
>>>> I am a bit puzzled: Is this an actual test failure at the moment? Or just a potential problem? Because I see it using the stack pointer in the generated code in lib/arm/psci.o. But the psci test seems to pass. Or is that just because the SP is somehow not cleared, because of some KVM implementation specifics?
>>> I think the test just doesn't care that the CPU is in an infinite
>>> exception loop. Indeed we should probably put the CPU into an
>>> infinite loop instead of attempting to call PSCI die with it, as
>>> the status of a dying or dead CPU may conflict with our expected
>>> status of the test.
>> I don't like this idea, I'll explain below why.
>>
>>>> One more thing below ...
>>>>
>>>>>  However, because no stack initialization has
>>>>> been done, the stack pointer is zero, as set by KVM when creating the VCPU.
>>>>> This causes a data abort without a change in exception level. The VBAR_EL1
>>>>> register is also zero (the KVM reset value for VBAR_EL1), the MMU is off,
>>>>> and we end up trying to fetch instructions from address 0x200.
>>>>>
>>>>> At this point, a stage 2 instruction abort is generated which is taken to
>>>>> KVM. KVM interprets this as an instruction fetch from an I/O region, and
>>>>> injects a prefetch abort into the guest. Prefetch abort is a synchronous
>>>>> exception, and on guest return the VCPU PC will be set to VBAR_EL1 + 0x200,
>>>>> which is...  0x200. The VCPU ends up in an infinite loop causing a prefetch
>>>>> abort while fetching the instruction to service the said abort.
>>>>>
>>>>> cpu_psci_cpu_die is basically a wrapper over the HVC instruction, so
>>>>> provide an assembly implementation for the function which will serve as the
>>>>> entry point for CPU_ON.
>>>>>
>>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@xxxxxxx>
>>>>> ---
>>>>>  arm/cstart.S   | 7 +++++++
>>>>>  arm/cstart64.S | 7 +++++++
>>>>>  arm/psci.c     | 5 +++--
>>>>>  3 files changed, 17 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arm/cstart.S b/arm/cstart.S
>>>>> index 2c81d39a666b..dfef48e4dbb2 100644
>>>>> --- a/arm/cstart.S
>>>>> +++ b/arm/cstart.S
>>>>> @@ -7,6 +7,7 @@
>>>>>   */
>>>>>  #define __ASSEMBLY__
>>>>>  #include <auxinfo.h>
>>>>> +#include <linux/psci.h>
>>>>>  #include <asm/thread_info.h>
>>>>>  #include <asm/asm-offsets.h>
>>>>>  #include <asm/pgtable-hwdef.h>
>>>>> @@ -139,6 +140,12 @@ secondary_entry:
>>>>>  	blx	r0
>>>>>  	b	do_idle
>>>>>  
>>>>> +.global asm_cpu_psci_cpu_die
>>>>> +asm_cpu_psci_cpu_die:
>>>>> +	ldr	r0, =PSCI_0_2_FN_CPU_OFF
>>>>> +	hvc	#0
>>>>> +	b	.
>>>> I am wondering if this implementation is actually too simple. Both the current implementation and the kernel clear at least the first three arguments to 0.
>>>> I failed to find a requirement for doing this (nothing in the SMCCC or the PSCI spec), but I guess it would make sense when looking at forward compatibility.
>>>>
>>>> At the very least it's a change in behaviour (ignoring the missing printf).
>>>> So shall we just clear r1, r2 and r3 here? (Same for arm64 below)
>>> If we were to keep this function, then I agree we should zero the
>>> registers, but as I said above, I think the proper fix for this issue is
>> Regarding zero'ing unused arguments, I've explained my point of view in my reply
>> to Andre.
>>
>>> to just not call PSCI die. Rather we should drop into an infinite loop,
>>> which also doesn't use the stack. Maybe something like this will work
>>>
>>> diff --git a/arm/psci.c b/arm/psci.c
>>> index 5c1accb6cea4..74c179d4976c 100644
>>> --- a/arm/psci.c
>>> +++ b/arm/psci.c
>>> @@ -79,10 +79,14 @@ static void cpu_on_secondary_entry(void)
>>>  	cpumask_set_cpu(cpu, &cpu_on_ready);
>>>  	while (!cpu_on_start)
>>>  		cpu_relax();
>>> -	cpu_on_ret[cpu] = psci_cpu_on(cpus[1], __pa(cpu_psci_cpu_die));
>>> +	cpu_on_ret[cpu] = psci_cpu_on(cpus[1], __pa(halt));
>>>  	cpumask_set_cpu(cpu, &cpu_on_done);
>>>  }
>>>  
>>> +/*
>>> + * This test expects CPU1 to not have previously been boot,
>>> + * and when this test completes CPU1 will be stuck in halt.
>>> + */
>>>  static bool psci_cpu_on_test(void)
>>>  {
>>>  	bool failed = false;
>>> @@ -104,7 +108,7 @@ static bool psci_cpu_on_test(void)
>>>  	cpu_on_start = 1;
>>>  	smp_mb();
>>>  
>>> -	cpu_on_ret[0] = psci_cpu_on(cpus[1], __pa(cpu_psci_cpu_die));
>>> +	cpu_on_ret[0] = psci_cpu_on(cpus[1], __pa(halt));
>>>  	cpumask_set_cpu(0, &cpu_on_done);
>>>  
>>>  	while (!cpumask_full(&cpu_on_done))
>> Trying to turn a CPU on and off concurrently from separate threads seems like a
>> nifty test to me, and good for catching possible races. With your version, 2
>> threads are enough to get all possible results: success and CPU already on.
>>
>> Besides that, if my memory serves me right, this exact test was the reason for a
>> refactoring of the VCPU reset code, in commit 03fdfb269009 ("KVM: arm64: Don't
>> write junk to sysregs on reset").
> It wasn't because there was a CPU_OFF involved. Two simultaneous CPU_ON's
> was enough. The first, successful CPU_ON reset the target VCPU causing the
> second CPU_ON to fail to look it up by MPIDR, resulting in
> PSCI_RET_INVALID_PARAMS.

My bad.

>> My preference would be to keep calling CPU_ON and CPU_OFF repeatedly.
>>
> Your analysis showed we never called CPU_OFF, because CPU1 got stuck
> in an exception loop. So we haven't actually done a repeated ON/OFF

True.

> test yet. That's not a bad idea, but I also like a test that ensures
> a single successful CPU_ON, like the patch below would ensure if we
> halted instead of CPU_OFF'ed.
>
> It should be possible to have both tests if we do the CPU_OFF test first.
>
> Thanks,
> drew
>
>
> diff --git a/arm/psci.c b/arm/psci.c
> index 5c1accb6cea4..8a24860bde28 100644
> --- a/arm/psci.c
> +++ b/arm/psci.c
> @@ -86,7 +86,7 @@ static void cpu_on_secondary_entry(void)
>  static bool psci_cpu_on_test(void)
>  {
>  	bool failed = false;
> -	int cpu;
> +	int cpu, ret_success = 0;
>  
>  	cpumask_set_cpu(1, &cpu_on_ready);
>  	cpumask_set_cpu(1, &cpu_on_done);
> @@ -113,7 +113,12 @@ static bool psci_cpu_on_test(void)
>  	for_each_present_cpu(cpu) {
>  		if (cpu == 1)
>  			continue;
> -		if (cpu_on_ret[cpu] != PSCI_RET_SUCCESS && cpu_on_ret[cpu] != PSCI_RET_ALREADY_ON) {
> +		if (cpu_on_ret[cpu] == PSCI_RET_SUCCESS) {
> +			if (++ret_success > 1) {
> +				report_info("more than one cpu_on success");
> +				failed = true;
> +			}
> +		} else if (cpu_on_ret[cpu] != PSCI_RET_ALREADY_ON) {
>  			report_info("unexpected cpu_on return value: caller=CPU%d, ret=%d", cpu, cpu_on_ret[cpu]);
>  			failed = true;
>  		}
>
Your patch looks fine to me. Tested it together with changing the CPU_ON address
to halt and running it 1000 times on Ampere EMAG:

Tested-by: Alexandru Elisei <alexandru.elisei@xxxxxxx>
Reviewed-by: Alexandru Elisei <alexandru.elisei@xxxxxxx>

I'm fine with dropping my patch from the series and sending it as an enhancement
after the other fixes get merged.

Thanks,
Alex