On Tue, 2021-05-04 at 09:45 -0700, Ben Gardon wrote: > On Mon, May 3, 2021 at 4:32 PM Kai Huang <kai.huang@xxxxxxxxx> wrote: > > > > On Mon, 2021-05-03 at 10:07 -0700, Ben Gardon wrote: > > > On Fri, Apr 30, 2021 at 9:03 AM Kai Huang <kai.huang@xxxxxxxxx> wrote: > > > > > > > > There are couple of issues in current tdp_mmu_map_handle_target_level() > > > > regarding to return value which reflects page fault handler's behavior > > > > -- whether it truely fixed page fault, or fault was suprious, or fault > > > > requires emulation, etc: > > > > > > > > 1) Currently tdp_mmu_map_handle_target_level() return 0, which is > > > > RET_PF_RETRY, when page fault is actually fixed. This makes > > > > kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of > > > > RET_PF_FIXED. > > > > > > Ooof that was an oversight. Thank you for catching that. > > > > Thanks for reviewing. > > > > > > > > > > > > > 2) When page fault is spurious, tdp_mmu_map_handle_target_level() > > > > currently doesn't return immediately. This is not correct, since it > > > > may, for instance, lead to double emulation for a single instruction. > > > > > > Could you please add an example of what would be required for this to > > > happen? What effect would it have? > > > I don't doubt you're correct on this point, just having a hard time > > > pinpointing where the issue is. > > > > Hmm.. After reading your reply, I think I wasn't thinking correctly about the emulation > > part :) > > > > I was thinking the case that two threads simultaneously write to video ram (which is write > > protected and requires emulation) which has been swapped out, in which case one thread > > will succeed with setting up the mapping, and the other will get atomic exchange failure. > > Since both threads are trying to setup the same mapping, I thought in this case for the > > second thread (that gets atomic exchange failure) should just give up. But reading code > > again, and with your reply, I think the right behavior is, actually both threads need to > > do the emulation, because this is the correct behavior.' > > > > That being said, I still believe that for spurious fault, we should return immediately > > (otherwise it is not spurious fault). But I now also believe the spurious fault check in > > existing code happens too early -- it has to be after write protection emulation check. > > And I just checked the mmu_set_spte() code, if I read correctly, it exactly puts spurious > > fault check after write protection emulation check. > > > > Does this make sense? > > Yeah, that makes sense. Having to move the emulation check after the > cmpxchg always felt a little weird to me Though I still think it makes > sense since the cmpxchg can fail. I guess my brain was dominated by the idea that for spurious fault we should return immediately :) But I guess we can also fix the pf_fixed count issue by simply: - if (!prefault) + if (!prefault && ret == RET_PF_FIXED) vcpu->stat.pf_fixed++; Which way do you prefer? > > > > > If this looks good to you, I guess it would be better to split this patch into smaller > > patches (for instance, one patch to handle case 1), and one to handle spurious fault > > change)? > > That sounds good to me. That would definitely make it easier to review. I'll post a new patch series I guess. Thanks!