On 02.04.20 17:20, Christian Borntraeger wrote: > > > On 02.04.20 17:12, David Hildenbrand wrote: >> On 02.04.20 13:02, Christian Borntraeger wrote: >>> make sure that sigp sense running status returns a sane value for >>> stopped CPUs. To avoid potential races with the stop being processed we >>> wait until sense running status is first 0. >>> >>> Signed-off-by: Christian Borntraeger <borntraeger@xxxxxxxxxx> >>> --- >>> lib/s390x/smp.c | 2 +- >>> lib/s390x/smp.h | 2 +- >>> s390x/smp.c | 13 +++++++++++++ >>> 3 files changed, 15 insertions(+), 2 deletions(-) >>> >>> diff --git a/lib/s390x/smp.c b/lib/s390x/smp.c >>> index 5ed8b7b..492cb05 100644 >>> --- a/lib/s390x/smp.c >>> +++ b/lib/s390x/smp.c >>> @@ -58,7 +58,7 @@ bool smp_cpu_stopped(uint16_t addr) >>> return !!(status & (SIGP_STATUS_CHECK_STOP|SIGP_STATUS_STOPPED)); >>> } >>> >>> -bool smp_cpu_running(uint16_t addr) >>> +bool smp_sense_running_status(uint16_t addr) >>> { >>> if (sigp(addr, SIGP_SENSE_RUNNING, 0, NULL) != SIGP_CC_STATUS_STORED) >>> return true; >>> diff --git a/lib/s390x/smp.h b/lib/s390x/smp.h >>> index a8b98c0..639ec92 100644 >>> --- a/lib/s390x/smp.h >>> +++ b/lib/s390x/smp.h >>> @@ -40,7 +40,7 @@ struct cpu_status { >>> int smp_query_num_cpus(void); >>> struct cpu *smp_cpu_from_addr(uint16_t addr); >>> bool smp_cpu_stopped(uint16_t addr); >>> -bool smp_cpu_running(uint16_t addr); >>> +bool smp_sense_running_status(uint16_t addr); >>> int smp_cpu_restart(uint16_t addr); >>> int smp_cpu_start(uint16_t addr, struct psw psw); >>> int smp_cpu_stop(uint16_t addr); >>> diff --git a/s390x/smp.c b/s390x/smp.c >>> index 79cdc1f..b4b1ff2 100644 >>> --- a/s390x/smp.c >>> +++ b/s390x/smp.c >>> @@ -210,6 +210,18 @@ static void test_emcall(void) >>> report_prefix_pop(); >>> } >>> >>> +static void test_sense_running(void) >>> +{ >>> + report_prefix_push("sense_running"); >>> + /* make sure CPU is stopped */ >>> + smp_cpu_stop(1); >>> + /* wait for stop to succeed. */ >>> + while(smp_sense_running_status(1)); >>> + report(!smp_sense_running_status(1), "CPU1 sense claims not running"); >>> + report_prefix_pop(); >>> +} >>> + >>> + >>> /* Used to dirty registers of cpu #1 before it is reset */ >>> static void test_func_initial(void) >>> { >>> @@ -319,6 +331,7 @@ int main(void) >>> test_store_status(); >>> test_ecall(); >>> test_emcall(); >>> + test_sense_running(); >>> test_reset(); >>> test_reset_initial(); >>> smp_cpu_destroy(1); >>> >> >> TBH, I am still not sure if this is completely free of races. >> >> Assume CPU 1 is in handle_stop() >> >> if (!kvm_s390_user_cpu_state_ctrl(vcpu->kvm)) >> kvm_s390_vcpu_stop(vcpu); >> // CPU 1: gets scheduled out. >> // CPU 0: while(smp_sense_running_status(1)); finishes >> // CPU 1: gets scheduled in to return to user space >> return -EOPNOTSUPP; >> // CPU 0: report(!smp_sense_running_status(1), "CPU1 sense claims not >> running"); fails >> >> SIGP SENSE RUNNING is simply racy as hell and doesn't give you any >> guarantees. Which is good enough for some performance improvements >> (e.g., spinlocks). >> >> Now, I can queue this, but I wouldn't be surprised if we see random >> failures at one point. > > Which would speak for Janoschs variant. Loop until non running at least once > and then report success? As long as the other CPU isn't always scheduled (unlikely) and always in the kernel (unlikely), this test would even pass without the smp_cpu_stop(). So the test doesn't say much except "sometimes, smp_sense_running_status(1) reports false". Agreed that the smp_cpu_stop() will make that appear faster. If we agree about these semantics, let's add them as a comment to the test. -- Thanks, David / dhildenb