On Thu, Jan 30, 2025 at 10:12:48PM +0000, Matthew Wilcox wrote: > On Mon, Jan 20, 2025 at 05:39:43PM +0100, Alexander Gordeev wrote: > > Despite the commit 75edd345e8ed claim above I still observed that a read > > access to shmem page could end up in creation of a dirty PTE: > > > > handle_pte_fault() -> do_pte_missing() -> do_fault() -> > > do_read_fault() -> finish_fault() -> set_pte_range() -> mk_pte() > > > > Our internal discussion (among other things) ended up in a need to really > > understand where the faults are coming from. > > > > Further, a cursory LTP test was showing ~18K page faults increase, which > > I did not confirm. That is the first thing I will re-do. > > > > Whether this change is a pre-requisite for something or what is your aim > > wrt to this patch? > > One of the things that needs to happen for the splitting of struct page > and struct folio is the removal of all '&folio->page'. Most are in > compatibility code or code that is being transitioned and can safely be > ignored. One of the places that needs to be changed is: > > entry = mk_pte(&folio->page, vma->vm_page_prot); > (there's a few places like this). > > I believe we need a folio_mk_pte(), and I don't want to define it for > each architecture. I don't even want to have a default implementation > and allow arches to override. I guess, folio_mk_pte() would then still call arch-specific mk_pte()? > So the question becomes whether to: > > (a) Get rid of the conditional pte_mkdirty() entirely (this trial > balloon) > (b) Put it in folio_mk_pte() for everybody, not just s390 > (c) Put it in set_pte_range() as David suggested. > > It's feeling like (c) is the best idea. I will check option (c) Thanks!