On Tue, Oct 09, 2018 at 06:57:33AM -0700, Sean Christopherson wrote: >On Mon, 2018-10-08 at 16:11 +0000, kvm-owner@xxxxxxxxxxxxxxx wrote: >> Hi, Sean >> >> Thanks for your comments, let me try to understand it. >> >> > >> > This handles the scenario where __get_user_pages_fast() failed and >> > get_user_pages_unlocked() succeeded, obviously with !write_fault && >> > writable. ??On a read fault, get_user_pages_unlocked() requests the >> > page with read permissions to ensure it doesn't incorrectly fail due >> > to invalid access permissions, i.e. doesn't speculatively request >> > write permissions for a read-only VMA. ??If that succeeds, we then >> I think I can understand at this point. >> >> Based on my understanding, we set *write* to 1 for __get_user_pages_fast() >> which means we want a writable page. While in hva_to_pfn_slow(), >> FOLL_WRITE is not set for get_user_pages_unlocked(), this means a >> readable page would meet our need. >> >> In case a read fault, if __get_user_pages_fast() failed, we still could >> try get_user_pages_unlocked() to get a readable page. >> >> > >> > use __get_user_pages_fast() to test if the VMA is writable. ??If the >> > VMA is indeed writable, we can create the shadow PTE with write >> > permissions even though we're handling a read fault. ??Marking the >> I can't follow up with you from this point. >> >> __get_user_pages_fast() iterate the page table and check those >> requirements. We pass the same parameter both in hva_to_pfn_slow() and >> hva_to_pfn_fast(). Curious about why we would get different result if no >> one else change the PTE? > >hva_to_pfn_slow() uses the exact access type (read vs write) when calling >get_user_pages_unlocked(), i.e. FOLL_WRITE is only set if @write_fault is >true, whereas the calls to __get_user_pages_fast() always try to pin the >page as writable. > >> BTW, you mentioned __get_user_pages_fast() would test the VMA? Would you >> mind pointing which part does this? I am lost at this point. Sean, Thanks a lot for your patient explanation. > >That was poorly worded. ??Let me try again: > >If get_user_pages_unlocked() succeeds and we're handling a read-fault, >use __get_user_pages_fast() to check if the page was mapped writable. >Even though we only request read permissions, the PTE *may* have write >permissions if the corresponding VMA is writable, e.g. a writable VMA >will generate a read-only PTE when the page is backed by the zero page >using copy-on-write semantics. To be honest, I got some difficulty to understand this part. The reason is some background knowledge I missed. Below is my guess on the background knowledge: ---------------------------------------------- There are two level/kind RW permission marking mechanism * VMA mapping: encoded in vma->vm_page_prot * PTE mapping: encoded in pteval, e.g. _PAGE_RW And these two permission for the same addr/page could be different. Hmm, if I am right (not understand the detail in the code, and not convienced by myself), * get_user_pages_unlocked() checks VMA mapping * __get_user_pages_fast() checks PTE mapping Based on your last sentence, VMA mapping has a higher priority. A writable VMA could be created through mmap, while its corresponding PTE could be read-only since a copy-on-write semantics applies here. This is the reason why we could retrive a read-only page from PTE mapping point of view, but writable from VMA point of view. And then based on our requirement, give the page write permission. > >If the PTE is indeed writable, we can also create the shadow PTE with >write permissions even though we're handling a read fault.????Marking >the SPTE as writable means we don't take another page fault if/when >the guest writes the page, e.g. in a RMW scenario we take one page >fault instead of two. > This makes sence to me. As you mentioned in the last sentense of previous paragraph, a read-only PTE with writable VMA means the PTE is indeed writalbe. In this case we could set the SPTE as write to reduce another page fault in a RMW scenario. After all I still have one confusion: My confusion is why the 2nd __get_user_pages_fast() would succeed while the 1st failed with same parameters passed. Do we touch the PTE mapping during get_user_pages_unlocked()? Hope this is not too silly. Thanks for your patience a lot :-) >> > >> > SPTE as writable means we don't take another page fault if/when the >> > guest writes the page, e.g. in a RMW scenario we take one page fault >> > instead of two. >> > >> Ah, thanks. I guess this is the benefits for this behavior. This reduces >> one page fault by giving a writable page for a read fault. >> >> > >> > > >> > > ?? *pfn = page_to_pfn(page); >> > > ?? return npages; >> > > ??} -- Wei Yang Help you, Help me