On Wed, Sep 11, 2024 at 10:16:55AM +0800, Yan Zhao wrote: > On Tue, Sep 10, 2024 at 08:16:10AM -0400, Peter Xu wrote: > > On Tue, Sep 10, 2024 at 10:52:01AM +0800, Yan Zhao wrote: > > > Hi Peter, > > > > Hi, Yan, > > > > > > > > Not sure if I missed anything. > > > > > > It looks that before this patch, pmd/pud are alawys write protected without > > > checking "is_cow_mapping(vma->vm_flags) && pud_write(pud)". pud_wrprotect() > > > clears dirty bit by moving the dirty value to the software bit. > > > > > > And I have a question that why previously pmd/pud are always write protected. > > > > IIUC this is a separate question - the move of dirty bit in pud_wrprotect() > > is to avoid wrongly creating shadow stack mappings. In our discussion I > > think that's an extra complexity and can be put aside; the dirty bit will > > get recovered in pud_clear_saveddirty() later, so it's not the same as > > pud_mkclean(). > But pud_clear_saveddirty() will only set dirty bit when write bit is 1. Yes, it's because x86 wants to avoid unexpected write=0 && dirty=1 entries, because it can wrongly reflect a shadow stack mapping. Here we cannot recover the dirty bit if set only if write bit is 1 first. > > > > > AFAIU pmd/pud paths don't consider is_cow_mapping() because normally we > > will not duplicate pgtables in fork() for most of shared file mappings > > (!CoW). Please refer to vma_needs_copy(), and the comment before returning > > false at last. I think it's not strictly is_cow_mapping(), as we're > > checking anon_vma there, however it's mostly it, just to also cover > > MAP_PRIVATE on file mappings too when there's no CoW happened (as if CoW > > happened then anon_vma will appear already). > > > > There're some outliers, e.g. userfault protected, or pfnmaps/mixedmaps. > > Userfault & mixedmap are not involved in this series at all, so let's > > discuss pfnmaps. > > > > It means, fork() can still copy pgtable for pfnmap vmas, and it's relevant > > to this series, because before this series pfnmap only exists in pte level, > > hence IMO the is_cow_mapping() must exist for pte level as you described, > > because it needs to properly take care of those. Note that in the pte > > processing it also checks pte_write() to make sure it's a COWed page, not a > > RO page cache / pfnmap / ..., for example. > > > > Meanwhile, since pfnmap won't appear in pmd/pud, I think it's fair that > > pmd/pud assumes when seeing a huge mapping it must be MAP_PRIVATE otherwise > > the whole copy_page_range() could be already skipped. IOW I think they > > only need to process COWed pages here, and those pages require write bit > > removed in both parent and child when fork(). > Is it also based on that there's no MAP_SHARED huge DEVMAP pages up to now? Correct. Thanks, -- Peter Xu