On Mon, 2018-10-08 at 16:11 +0000, kvm-owner@xxxxxxxxxxxxxxx wrote: > On Mon, Oct 08, 2018 at 08:21:53AM -0700, Sean Christopherson wrote: > > > > On Mon, 2018-10-08 at 22:41 +0800, kvm-owner@xxxxxxxxxxxxxxx wrote: > > > > > > Case (!write_fault && writable) has been handled in hva_to_pfn_fast(), > > > it is not necessary to try again if hva_to_pfn_fast() already failed. > > > > > > This patch removes this case in hva_to_pfn_slow(). > > > > > > Signed-off-by: Wei Yang <richard.weiyang@xxxxxxxxx> > > > > > > --- > > > > > > Hope my understanding is correct. > > > > > > --- > > > ??virt/kvm/kvm_main.c | 10 ---------- > > > ??1 file changed, 10 deletions(-) > > > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index 1f42f1d474b5..c8fb3a9d81fa 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -1403,16 +1403,6 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, > > > ?? if (npages != 1) > > > ?? return npages; > > > ?? > > > - /* map read fault as writable if possible */ > > > - if (unlikely(!write_fault) && writable) { > > > - struct page *wpage; > > > - > > > - if (__get_user_pages_fast(addr, 1, 1, &wpage) == 1) { > > > - *writable = true; > > > - put_page(page); > > > - page = wpage; > > > - } > > > - } > Hi, Sean > > Thanks for your comments, let me try to understand it. > > > > > This handles the scenario where __get_user_pages_fast() failed and > > get_user_pages_unlocked() succeeded, obviously with !write_fault && > > writable. ??On a read fault, get_user_pages_unlocked() requests the > > page with read permissions to ensure it doesn't incorrectly fail due > > to invalid access permissions, i.e. doesn't speculatively request > > write permissions for a read-only VMA. ??If that succeeds, we then > I think I can understand at this point. > > Based on my understanding, we set *write* to 1 for __get_user_pages_fast() > which means we want a writable page. While in hva_to_pfn_slow(), > FOLL_WRITE is not set for get_user_pages_unlocked(), this means a > readable page would meet our need. > > In case a read fault, if __get_user_pages_fast() failed, we still could > try get_user_pages_unlocked() to get a readable page. > > > > > use __get_user_pages_fast() to test if the VMA is writable. ??If the > > VMA is indeed writable, we can create the shadow PTE with write > > permissions even though we're handling a read fault. ??Marking the > I can't follow up with you from this point. > > __get_user_pages_fast() iterate the page table and check those > requirements. We pass the same parameter both in hva_to_pfn_slow() and > hva_to_pfn_fast(). Curious about why we would get different result if no > one else change the PTE? hva_to_pfn_slow() uses the exact access type (read vs write) when calling get_user_pages_unlocked(), i.e. FOLL_WRITE is only set if @write_fault is true, whereas the calls to __get_user_pages_fast() always try to pin the page as writable. > BTW, you mentioned __get_user_pages_fast() would test the VMA? Would you > mind pointing which part does this? I am lost at this point. That was poorly worded. Let me try again: If get_user_pages_unlocked() succeeds and we're handling a read-fault, use __get_user_pages_fast() to check if the page was mapped writable. Even though we only request read permissions, the PTE *may* have write permissions if the corresponding VMA is writable, e.g. a writable VMA will generate a read-only PTE when the page is backed by the zero page using copy-on-write semantics. If the PTE is indeed writable, we can also create the shadow PTE with write permissions even though we're handling a read fault. Marking the SPTE as writable means we don't take another page fault if/when the guest writes the page, e.g. in a RMW scenario we take one page fault instead of two. > > > > SPTE as writable means we don't take another page fault if/when the > > guest writes the page, e.g. in a RMW scenario we take one page fault > > instead of two. > > > Ah, thanks. I guess this is the benefits for this behavior. This reduces > one page fault by giving a writable page for a read fault. > > > > > > > > > ?? *pfn = page_to_pfn(page); > > > ?? return npages; > > > ??}