Re: [PATCH] KVM: selftests: Double check on the current CPU in rseq_test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/14/22 10:06, Gavin Shan wrote:
In rseq_test, there are two threads created. Those two threads are
'main' and 'migration_thread' separately. We also have the assumption
that non-migration status on 'migration-worker' thread guarantees the
same non-migration status on 'main' thread. Unfortunately, the assumption
isn't true. The 'main' thread can be migrated from one CPU to another
one between the calls to sched_getcpu() and READ_ONCE(__rseq.cpu_id).
The following assert is raised eventually because of the mismatched
CPU numbers.

The issue can be reproduced on arm64 system occasionally.

Hmm, this does not seem a correct patch - the threads are already synchronizing using seq_cnt, like this:

	migration			main
	----------------------		--------------------------------
	seq_cnt = 1
	smp_wmb()
					snapshot = 0
					smp_rmb()
					cpu = sched_getcpu() reads 23
	sched_setaffinity()
					rseq_cpu = __rseq.cpuid reads 35
					smp_rmb()
					snapshot != seq_cnt -> retry
	smp_wmb()
	seq_cnt = 2

sched_setaffinity() is guaranteed to block until the task is enqueued on an allowed CPU.

Can you check that smp_rmb() and smp_wmb() generate correct instructions on arm64?

Paolo

   host# uname -r
   5.19.0-rc6-gavin+
   host# # cat /proc/cpuinfo | grep processor | tail -n 1
   processor    : 223
   host# pwd
   /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm
   host# for i in `seq 1 100`;   \
         do echo "--------> $i"; \
         ./rseq_test; sleep 3;   \
         done
   --------> 1
   --------> 2
   --------> 3
   --------> 4
   --------> 5
   --------> 6
   ==== Test Assertion Failure ====
     rseq_test.c:265: rseq_cpu == cpu
     pid=3925 tid=3925 errno=4 - Interrupted system call
        1  0x0000000000401963: main at rseq_test.c:265 (discriminator 2)
        2  0x0000ffffb044affb: ?? ??:0
        3  0x0000ffffb044b0c7: ?? ??:0
        4  0x0000000000401a6f: _start at ??:?
     rseq CPU = 4, sched CPU = 27

This fixes the issue by double-checking on the current CPU after
call to READ_ONCE(__rseq.cpu_id) and restarting the test if the
two consecutive CPU numbers aren't euqal.

Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
Signed-off-by: Gavin Shan <gshan@xxxxxxxxxx>
---
  tools/testing/selftests/kvm/rseq_test.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index 4158da0da2bb..74709dd9f5b2 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -207,7 +207,7 @@ int main(int argc, char *argv[])
  {
  	int r, i, snapshot;
  	struct kvm_vm *vm;
-	u32 cpu, rseq_cpu;
+	u32 cpu, rseq_cpu, last_cpu;
/* Tell stdout not to buffer its content */
  	setbuf(stdout, NULL);
@@ -259,8 +259,9 @@ int main(int argc, char *argv[])
  			smp_rmb();
  			cpu = sched_getcpu();
  			rseq_cpu = READ_ONCE(__rseq.cpu_id);
+			last_cpu = sched_getcpu();
  			smp_rmb();
-		} while (snapshot != atomic_read(&seq_cnt));
+		} while (snapshot != atomic_read(&seq_cnt) || cpu != last_cpu);
TEST_ASSERT(rseq_cpu == cpu,
  			    "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);




[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux