On 02.04.20 17:38, Christian Borntraeger wrote: > > > On 02.04.20 17:25, David Hildenbrand wrote: >> On 02.04.20 17:20, Christian Borntraeger wrote: >>> >>> >>> On 02.04.20 17:12, David Hildenbrand wrote: >>>> On 02.04.20 13:02, Christian Borntraeger wrote: >>>>> make sure that sigp sense running status returns a sane value for >>>>> stopped CPUs. To avoid potential races with the stop being processed we >>>>> wait until sense running status is first 0. >>>>> >>>>> Signed-off-by: Christian Borntraeger <borntraeger@xxxxxxxxxx> >>>>> --- >>>>> lib/s390x/smp.c | 2 +- >>>>> lib/s390x/smp.h | 2 +- >>>>> s390x/smp.c | 13 +++++++++++++ >>>>> 3 files changed, 15 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/lib/s390x/smp.c b/lib/s390x/smp.c >>>>> index 5ed8b7b..492cb05 100644 >>>>> --- a/lib/s390x/smp.c >>>>> +++ b/lib/s390x/smp.c >>>>> @@ -58,7 +58,7 @@ bool smp_cpu_stopped(uint16_t addr) >>>>> return !!(status & (SIGP_STATUS_CHECK_STOP|SIGP_STATUS_STOPPED)); >>>>> } >>>>> >>>>> -bool smp_cpu_running(uint16_t addr) >>>>> +bool smp_sense_running_status(uint16_t addr) >>>>> { >>>>> if (sigp(addr, SIGP_SENSE_RUNNING, 0, NULL) != SIGP_CC_STATUS_STORED) >>>>> return true; >>>>> diff --git a/lib/s390x/smp.h b/lib/s390x/smp.h >>>>> index a8b98c0..639ec92 100644 >>>>> --- a/lib/s390x/smp.h >>>>> +++ b/lib/s390x/smp.h >>>>> @@ -40,7 +40,7 @@ struct cpu_status { >>>>> int smp_query_num_cpus(void); >>>>> struct cpu *smp_cpu_from_addr(uint16_t addr); >>>>> bool smp_cpu_stopped(uint16_t addr); >>>>> -bool smp_cpu_running(uint16_t addr); >>>>> +bool smp_sense_running_status(uint16_t addr); >>>>> int smp_cpu_restart(uint16_t addr); >>>>> int smp_cpu_start(uint16_t addr, struct psw psw); >>>>> int smp_cpu_stop(uint16_t addr); >>>>> diff --git a/s390x/smp.c b/s390x/smp.c >>>>> index 79cdc1f..b4b1ff2 100644 >>>>> --- a/s390x/smp.c >>>>> +++ b/s390x/smp.c >>>>> @@ -210,6 +210,18 @@ static void test_emcall(void) >>>>> report_prefix_pop(); >>>>> } >>>>> >>>>> +static void test_sense_running(void) >>>>> +{ >>>>> + report_prefix_push("sense_running"); >>>>> + /* make sure CPU is stopped */ >>>>> + smp_cpu_stop(1); >>>>> + /* wait for stop to succeed. */ >>>>> + while(smp_sense_running_status(1)); >>>>> + report(!smp_sense_running_status(1), "CPU1 sense claims not running"); >>>>> + report_prefix_pop(); >>>>> +} >>>>> + >>>>> + >>>>> /* Used to dirty registers of cpu #1 before it is reset */ >>>>> static void test_func_initial(void) >>>>> { >>>>> @@ -319,6 +331,7 @@ int main(void) >>>>> test_store_status(); >>>>> test_ecall(); >>>>> test_emcall(); >>>>> + test_sense_running(); >>>>> test_reset(); >>>>> test_reset_initial(); >>>>> smp_cpu_destroy(1); >>>>> >>>> >>>> TBH, I am still not sure if this is completely free of races. >>>> >>>> Assume CPU 1 is in handle_stop() >>>> >>>> if (!kvm_s390_user_cpu_state_ctrl(vcpu->kvm)) >>>> kvm_s390_vcpu_stop(vcpu); >>>> // CPU 1: gets scheduled out. >>>> // CPU 0: while(smp_sense_running_status(1)); finishes >>>> // CPU 1: gets scheduled in to return to user space >>>> return -EOPNOTSUPP; >>>> // CPU 0: report(!smp_sense_running_status(1), "CPU1 sense claims not >>>> running"); fails >>>> >>>> SIGP SENSE RUNNING is simply racy as hell and doesn't give you any >>>> guarantees. Which is good enough for some performance improvements >>>> (e.g., spinlocks). >>>> >>>> Now, I can queue this, but I wouldn't be surprised if we see random >>>> failures at one point. >>> >>> Which would speak for Janoschs variant. Loop until non running at least once >>> and then report success? >> >> As long as the other CPU isn't always scheduled (unlikely) and always in >> the kernel (unlikely), this test would even pass without the >> smp_cpu_stop(). So the test doesn't say much except "sometimes, >> smp_sense_running_status(1) reports false". Agreed that the >> smp_cpu_stop() will make that appear faster. >> >> If we agree about these semantics, let's add them as a comment to the test. > > > Something like this: (I also added a test for running = true) > > static void test_sense_running(void) > { > report_prefix_push("sense_running"); > /* we are running */ > report(smp_sense_running_status(0), "CPU0 sense claims running"); > /* make sure CPU is stopped to speed up the not running case */ > smp_cpu_stop(1); > /* Make sure to have at least one time with a not running indication */ > while(smp_sense_running_status(1)); > report(true, "CPU1 sense claims not running"); > report_prefix_pop(); > } Yeah, looks better IMHO. Thanks Thanks, David / dhildenb