Re: [RFC PATCH] KVM: remove the writable page for read fault case in hva_to_pfn_slow()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 09, 2018 at 06:57:33AM -0700, Sean Christopherson wrote:
>On Mon, 2018-10-08 at 16:11 +0000, kvm-owner@xxxxxxxxxxxxxxx wrote:
>> Hi, Sean
>> 
>> Thanks for your comments, let me try to understand it.
>> 
>> > 
>> > This handles the scenario where __get_user_pages_fast() failed and
>> > get_user_pages_unlocked() succeeded, obviously with !write_fault &&
>> > writable. ??On a read fault, get_user_pages_unlocked() requests the
>> > page with read permissions to ensure it doesn't incorrectly fail due
>> > to invalid access permissions, i.e. doesn't speculatively request
>> > write permissions for a read-only VMA. ??If that succeeds, we then
>> I think I can understand at this point.
>> 
>> Based on my understanding, we set *write* to 1 for __get_user_pages_fast()
>> which means we want a writable page. While in hva_to_pfn_slow(),
>> FOLL_WRITE is not set for get_user_pages_unlocked(), this means a
>> readable page would meet our need.
>> 
>> In case a read fault, if __get_user_pages_fast() failed, we still could
>> try get_user_pages_unlocked() to get a readable page.
>> 
>> > 
>> > use __get_user_pages_fast() to test if the VMA is writable. ??If the
>> > VMA is indeed writable, we can create the shadow PTE with write
>> > permissions even though we're handling a read fault. ??Marking the
>> I can't follow up with you from this point.
>> 
>> __get_user_pages_fast() iterate the page table and check those
>> requirements. We pass the same parameter both in hva_to_pfn_slow() and
>> hva_to_pfn_fast(). Curious about why we would get different result if no
>> one else change the PTE?
>
>hva_to_pfn_slow() uses the exact access type (read vs write) when calling
>get_user_pages_unlocked(), i.e. FOLL_WRITE is only set if @write_fault is
>true, whereas the calls to __get_user_pages_fast() always try to pin the
>page as writable.
>
>> BTW, you mentioned __get_user_pages_fast() would test the VMA? Would you
>> mind pointing which part does this? I am lost at this point.

Sean,

Thanks a lot for your patient explanation.

>
>That was poorly worded. ??Let me try again:
>
>If get_user_pages_unlocked() succeeds and we're handling a read-fault,
>use __get_user_pages_fast() to check if the page was mapped writable.
>Even though we only request read permissions, the PTE *may* have write
>permissions if the corresponding VMA is writable, e.g. a writable VMA
>will generate a read-only PTE when the page is backed by the zero page
>using copy-on-write semantics.

To be honest, I got some difficulty to understand this part. The reason
is some background knowledge I missed.

Below is my guess on the background knowledge:
----------------------------------------------

There are two level/kind RW permission marking mechanism

   * VMA mapping: encoded in vma->vm_page_prot
   * PTE mapping: encoded in pteval, e.g. _PAGE_RW

And these two permission for the same addr/page could be different.

Hmm, if I am right (not understand the detail in the code, and not
convienced by myself),

   * get_user_pages_unlocked() checks VMA mapping
   * __get_user_pages_fast() checks PTE mapping

Based on your last sentence, VMA mapping has a higher priority. A
writable VMA could be created through mmap, while its corresponding PTE
could be read-only since a copy-on-write semantics applies here.

This is the reason why we could retrive a read-only page from PTE
mapping point of view, but writable from VMA point of view. And then
based on our requirement, give the page write permission.

>
>If the PTE is indeed writable, we can also create the shadow PTE with
>write permissions even though we're handling a read fault.????Marking
>the SPTE as writable means we don't take another page fault if/when
>the guest writes the page, e.g. in a RMW scenario we take one page
>fault instead of two.
>

This makes sence to me.

As you mentioned in the last sentense of previous paragraph, a read-only
PTE with writable VMA means the PTE is indeed writalbe. In this case we
could set the SPTE as write to reduce another page fault in a RMW
scenario.

After all I still have one confusion:

  My confusion is why the 2nd __get_user_pages_fast() would succeed
  while the 1st failed with same parameters passed. Do we touch the PTE
  mapping during get_user_pages_unlocked()?
  
  Hope this is not too silly.

Thanks for your patience a lot :-)

>> > 
>> > SPTE as writable means we don't take another page fault if/when the
>> > guest writes the page, e.g. in a RMW scenario we take one page fault
>> > instead of two.
>> > 
>> Ah, thanks. I guess this is the benefits for this behavior. This reduces
>> one page fault by giving a writable page for a read fault.
>> 
>> > 
>> > > 
>> > > ??	*pfn = page_to_pfn(page);
>> > > ??	return npages;
>> > > ??}

-- 
Wei Yang
Help you, Help me



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux