Re: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 10, 2024 at 08:16:10AM -0400, Peter Xu wrote:
> On Tue, Sep 10, 2024 at 10:52:01AM +0800, Yan Zhao wrote:
> > Hi Peter,
> 
> Hi, Yan,
> 
> > 
> > Not sure if I missed anything.
> > 
> > It looks that before this patch, pmd/pud are alawys write protected without
> > checking "is_cow_mapping(vma->vm_flags) && pud_write(pud)". pud_wrprotect()
> > clears dirty bit by moving the dirty value to the software bit.
> > 
> > And I have a question that why previously pmd/pud are always write protected.
> 
> IIUC this is a separate question - the move of dirty bit in pud_wrprotect()
> is to avoid wrongly creating shadow stack mappings.  In our discussion I
> think that's an extra complexity and can be put aside; the dirty bit will
> get recovered in pud_clear_saveddirty() later, so it's not the same as
> pud_mkclean().
But pud_clear_saveddirty() will only set dirty bit when write bit is 1.

> 
> AFAIU pmd/pud paths don't consider is_cow_mapping() because normally we
> will not duplicate pgtables in fork() for most of shared file mappings
> (!CoW).  Please refer to vma_needs_copy(), and the comment before returning
> false at last.  I think it's not strictly is_cow_mapping(), as we're
> checking anon_vma there, however it's mostly it, just to also cover
> MAP_PRIVATE on file mappings too when there's no CoW happened (as if CoW
> happened then anon_vma will appear already).
> 
> There're some outliers, e.g. userfault protected, or pfnmaps/mixedmaps.
> Userfault & mixedmap are not involved in this series at all, so let's
> discuss pfnmaps.
> 
> It means, fork() can still copy pgtable for pfnmap vmas, and it's relevant
> to this series, because before this series pfnmap only exists in pte level,
> hence IMO the is_cow_mapping() must exist for pte level as you described,
> because it needs to properly take care of those.  Note that in the pte
> processing it also checks pte_write() to make sure it's a COWed page, not a
> RO page cache / pfnmap / ..., for example.
> 
> Meanwhile, since pfnmap won't appear in pmd/pud, I think it's fair that
> pmd/pud assumes when seeing a huge mapping it must be MAP_PRIVATE otherwise
> the whole copy_page_range() could be already skipped.  IOW I think they
> only need to process COWed pages here, and those pages require write bit
> removed in both parent and child when fork().
Is it also based on that there's no MAP_SHARED huge DEVMAP pages up to now?

> 
> After this series, pfnmaps can appear in the form of pmd/pud, then the
> previous assumption will stop holding true, as we'll still copy pfnmaps
> during fork() always. My guessing of the reason is because most of the
> drivers map pfnmap vmas only during mmap(), it means there can normally
> have no fault() handler at all for those pfns.
> 
> In this case, we'll need to also identify whether the page is COWed, using
> the newly added "is_cow_mapping() && pxx_write()" in this series (added
> to pud path, while for pmd path I used a WARN_ON_ONCE instead).
> 
> If we don't do that, it means e.g. for a VM_SHARED pfnmap vma, after fork()
> we'll wrongly observe write protected entries.  Here the change will make
> sure VM_SHARED can properly persist the write bits on pmds/puds.
> 
> Hope that explains.
> 
Thanks a lot for such detailed explanation!
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux