On Sat, Feb 08, 2025, Yan Zhao wrote: > In the read-only mprotect() phase of mmu_stress_test, ensure that > mprotect(PROT_READ) has completed before the guest starts writing to the > read-only mprotect() memory. > > Without waiting for mprotect_ro_done before the guest starts writing in > stage 3 (the stage for read-only mprotect()), the host's assertion of stage > 3 could fail if mprotect_ro_done is set to true in the window between the > guest finishing writes to all GPAs and executing GUEST_SYNC(3). > > This scenario is easy to occur especially when there are hundred of vCPUs. > > CPU 0 CPU 1 guest CPU 1 host > enter stage 3's 1st loop > //in stage 3 > write all GPAs > @rip 0x4025f0 > > mprotect(PROT_READ) > mprotect_ro_done=true > GUEST_SYNC(3) > r=0, continue stage 3's 1st loop > > //in stage 4 > write GPA > @rip 0x402635 > > -EFAULT, jump out stage 3's 1st loop > enter stage 3's 2nd loop > write GPA > @rip 0x402635 > -EFAULT, continue stage 3's 2nd loop > guest rip += 3 > > The test then fails and reports "Unhandled exception '0xe' at guest RIP > '0x402638'", since the next valid guest rip address is 0x402639, i.e. the > "(mem) = val" in vcpu_arch_put_guest() is compiled into a mov instruction > of length 4. This shouldn't happen. On x86, stage 3 is a hand-coded "mov %rax, (%rax)", not vcpu_arch_put_guest(). Either something else is going on, or __x86_64__ isn't defined? do { for (gpa = start_gpa; gpa < end_gpa; gpa += stride) #ifdef __x86_64__ asm volatile(".byte 0x48,0x89,0x00" :: "a"(gpa) : "memory"); /* mov %rax, (%rax) */ #elif defined(__aarch64__) asm volatile("str %0, [%0]" :: "r" (gpa) : "memory"); #else vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa); #endif } while (!READ_ONCE(mprotect_ro_done)); /* * Only architectures that write the entire range can explicitly sync, * as other architectures will be stuck on the write fault. */ #if defined(__x86_64__) || defined(__aarch64__) GUEST_SYNC(3); #endif for (gpa = start_gpa; gpa < end_gpa; gpa += stride) vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa); GUEST_SYNC(4); > Even if it could be compiled into a mov instruction of length 3, the > following execution of GUEST_SYNC(4) in guest could still cause the host > failure of the assertion of stage 3. Sorry, I don't follow. The guest doesn't get "released" from GUEST_SYNC(3) until the host runs the vCPU again, and that happens after asserting on stage 3 and synchronizing with the main thread. assert_sync_stage(vcpu, 3); #endif /* __x86_64__ || __aarch64__ */ rendezvous_with_boss(); /* * Stage 4. Run to completion, waiting for mprotect(PROT_WRITE) to * make the memory writable again. */ do { r = _vcpu_run(vcpu); } while (r && errno == EFAULT);