On Wed, Feb 12, 2025 at 02:29:26AM +0800, Edgecombe, Rick P wrote: > On Tue, 2025-02-11 at 09:42 +0800, Yan Zhao wrote: > > > Is this a fix for the intermittent failure we saw on the v6.13-rc3 based kvm > > > branch? Funnily, I can't seem to reproduce it anymore, with or without this > > > fix. > > Hmm, it can be reproduced in my SPR (non TDX) almost every time. > > It depends on the timing when mprotect(PROT_READ) is completed done. > > > > Attached the detailed error log in my machine at the bottom. > > I must be getting lucky on timing. BTW, in the above I actually meant on either It's also not easy for me to reproduce it in my TDX platform. While the reproducing rate is around 100% in my non-TDX SPR machine, if I add a minor delay in the guest (e.g. printing the value of mprotect_ro_done in the beginning of stage 3), the test can also get passed. > the new or old *kernel*. Hmm, the test fails in my platform as soon as b6c304aec648 ("KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)")" is introduced, which is from 6.13.0-rc2. > > > > > On the fix though, doesn't this remove the coverage of writing to a region > > > that > > > is in the process of being made RO? I'm thinking about warnings, etc that > > > may > > > trigger intermittently based on bugs with a race component. I don't know if > > > we > > > could fix the test and still leave the write while the "mprotect(PROT_READ) > > > is > > > underway". It seems to be deliberate. > > Write before "mprotect(PROT_READ)" has been tested in stage 0. > > Not sure it's deliberate to test write in the process of being made RO. > > If it is, maybe we could make the fix by writing to RO memory a second time > > after mprotect_ro_done is true: > > That could work if it's desirable to maintain the testing. I would mention the > reduced scope in the log at least. Maybe Sean will chime in. Not really a reduced scope. Before this patch, it's also not guaranteed for the memory writes to occur during the mprotect(PROT_READ) transition. i.e. there are 3 possibilities: 1. all writes occur before mprotect(PROT_READ) takes effect. 2. all writes occur after mprotect(PROT_READ) takes effect. 3. some writes occur before mprotect(PROT_READ) takes effect and some occur after. case 3 is not guaranteed without introducing another synchronization flag. That said, I'm not sure if invoking writes before mprotect_ro_done being read as true is indeed necessary. But I'm fine with either way: 1. make sure all writes are after mprotect_ro_done is true (as in this patch). 2. call the do-while loop twice to ensure some writes must happen after mprotect_ro_done is true > Also, I think it needs: > > Fixes: b6c304aec648 ("KVM: selftests: Verify KVM correctly handles > mprotect(PROT_READ)") Will add it in v2!