On Tue, Jun 30, 2020 at 06:12:49PM +0200, Vitaly Kuznetsov wrote: > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes: > > > On Tue, Jun 30, 2020 at 05:43:54PM +0200, Vitaly Kuznetsov wrote: > >> Vivek Goyal <vgoyal@xxxxxxxxxx> writes: > >> > >> > On Tue, Jun 30, 2020 at 05:13:54PM +0200, Vitaly Kuznetsov wrote: > >> >> > >> >> > - If you retry in kernel, we will change the context completely that > >> >> > who was trying to access the gfn in question. We want to retain > >> >> > the real context and retain information who was trying to access > >> >> > gfn in question. > >> >> > >> >> (Just so I understand the idea better) does the guest context matter to > >> >> the host? Or, more specifically, are we going to do anything besides > >> >> get_user_pages() which will actually analyze who triggered the access > >> >> *in the guest*? > >> > > >> > When we exit to user space, qemu prints bunch of register state. I am > >> > wondering what does that state represent. Does some of that traces > >> > back to the process which was trying to access that hva? I don't > >> > know. > >> > >> We can get the full CPU state when the fault happens if we need to but > >> generally we are not analyzing it. I can imagine looking at CPL, for > >> example, but trying to distinguish guest's 'process A' from 'process B' > >> may not be simple. > >> > >> > > >> > I think keeping a cache of error gfns might not be too bad from > >> > implemetation point of view. I will give it a try and see how > >> > bad does it look. > >> > >> Right; I'm only worried about the fact that every cache (or hash) has a > >> limited size and under certain curcumstances we may overflow it. When an > >> overflow happens, we will follow the APF path again and this can go over > >> and over. Maybe we can punch a hole in EPT/NPT making the PFN reserved/ > >> not-present so when the guest tries to access it again we trap the > >> access in KVM and, if the error persists, don't follow the APF path? > > > > Just to make sure I'm somewhat keeping track, is the problem we're trying to > > solve that the guest may not immediately retry the "bad" GPA and so KVM may > > not detect that the async #PF already came back as -EFAULT or whatever? > > Yes. In Vivek's patch there's a single 'error_gfn' per vCPU which serves > as an indicator whether to follow APF path or not. A thought along the lines of your "punch a hole in the page tables" idea would be to invalidate the SPTE (in the unlikely case it's present but not writable) and tagging it as being invalid for async #PF. E.g. for !EPT, there are 63 bits available for metadata. For EPT, there's a measly 60, assuming we want to avoid using SUPPRESS_VE. The fully !present case would be straightforward, but the !writable case would require extra work, especially for shadow paging. With the SPTE tagged, it'd "just" be a matter of hooking into the page fault paths to detect the flag and disable async #PF. For TDP that's not too bad, e.g. pass in a flag to fast_page_fault() and propagate it to try_async_pf(). Not sure how to handle shadow paging, that code makes my head hurt just looking at it. It'd require tweaking is_shadow_present_pte() to be more precise, but that's probably a good thing, and peanuts compared to handling the faults.