Re: [PATCH] s390: Remove PageDirty check inside mk_pte()

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 30 Jan 2025 22:12:48 +0000

On Mon, Jan 20, 2025 at 05:39:43PM +0100, Alexander Gordeev wrote:
> Despite the commit 75edd345e8ed claim above I still observed that a read
> access to shmem page could end up in creation of a dirty PTE:
> 
> 	handle_pte_fault() -> do_pte_missing() -> do_fault() ->
> 	do_read_fault() -> finish_fault() -> set_pte_range() -> mk_pte()
> 
> Our internal discussion (among other things) ended up in a need to really
> understand where the faults are coming from.
> 
> Further, a cursory LTP test was showing ~18K page faults increase, which
> I did not confirm. That is the first thing I will re-do.
> 
> Whether this change is a pre-requisite for something or what is your aim
> wrt to this patch?

One of the things that needs to happen for the splitting of struct page
and struct folio is the removal of all '&folio->page'.  Most are in
compatibility code or code that is being transitioned and can safely be
ignored.  One of the places that needs to be changed is:

	entry = mk_pte(&folio->page, vma->vm_page_prot);
(there's a few places like this).

I believe we need a folio_mk_pte(), and I don't want to define it for
each architecture.  I don't even want to have a default implementation
and allow arches to override.

So the question becomes whether to:

(a) Get rid of the conditional pte_mkdirty() entirely (this trial
    balloon)
(b) Put it in folio_mk_pte() for everybody, not just s390
(c) Put it in set_pte_range() as David suggested.

It's feeling like (c) is the best idea.