Re: [PATCH] s390: Remove PageDirty check inside mk_pte()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 03, 2025 at 02:03:05PM +0100, Alexander Gordeev wrote:
> > So the question becomes whether to:
> > 
> > (a) Get rid of the conditional pte_mkdirty() entirely (this trial
> >     balloon)
> > (b) Put it in folio_mk_pte() for everybody, not just s390
> > (c) Put it in set_pte_range() as David suggested.
> > 
> > It's feeling like (c) is the best idea.
> 
> I will check option (c)

These call stacks end up in set_pte_range():

mk_pte()
set_pte_range()
finish_fault()
do_read_fault()
do_pte_missing()
__handle_mm_fault()
handle_mm_fault()

mk_pte()
set_pte_range()
finish_fault()
do_shared_fault()
do_pte_missing()
__handle_mm_fault()
handle_mm_fault()

mk_pte()
set_pte_range()
filemap_map_pages()
do_read_fault()
do_pte_missing()
__handle_mm_fault()
handle_mm_fault()

Moving the PageDirty() check to generic code works (in a sense page fault
volume does not notably increase):

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 3ca5af4cfe43..b5d88f2b5214 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1454,8 +1454,6 @@ static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
 	unsigned long physpage = page_to_phys(page);
 	pte_t __pte = mk_pte_phys(physpage, pgprot);
 
-	if (pte_write(__pte) && PageDirty(page))
-		__pte = pte_mkdirty(__pte);
 	return __pte;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 539c0f7c6d54..4b04325db2ee 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5116,6 +5116,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
 
 	flush_icache_pages(vma, page, nr);
 	entry = mk_pte(page, vma->vm_page_prot);
+	if (pte_write(entry) && PageDirty(page))
+		entry = pte_mkdirty(entry);
 
 	if (prefault && arch_wants_old_prefaulted_pte())
 		entry = pte_mkold(entry);

The above is however not exactly the same, since set_pte_range() -> set_ptes()
dirtyfies all PTEs in a folio - unlike the current s390 implementation, which
dirtyfies a single PTE based on its struct page flag.

remove_migration_ptes() probably needs to be updated as well, unless we are
fine with a claim that a PTE is allowed not always be pre-dirtyfied. That
could also be true for mk_pte() call paths I did not manage to find or will
get added in the future.

mk_pte()
remove_migration_ptes()
migrate_pages_batch()
migrate_pages_sync()
migrate_pages()
compact_zone()
compact_node()
kcompactd()
kthread()
__ret_from_fork()
ret_from_fork()

Also, with the above change to set_pte_range() hugetlb PTEs are affected:

mk_pte()
mk_huge_pte()
make_huge_pte()
hugetlb_no_page()
hugetlb_fault()
handle_mm_fault()

Thus, we need to consider pre-dirtying of hugetlb PTEs as well,
which I think is:

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 65068671e460..1c890b3c9453 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5168,7 +5168,7 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
 	pte_t entry;
 	unsigned int shift = huge_page_shift(hstate_vma(vma));
 
-	if (try_mkwrite && (vma->vm_flags & VM_WRITE)) {
+	if ((vma->vm_flags & VM_WRITE) && (try_mkwrite || PageDirty(page))) {
 		entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page,
 					 vma->vm_page_prot)));
 	} else {


Thanks!




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux