On Thu, Nov 09, 2023 at 10:16:57AM +0000, Nadav Amit wrote: > > > > On Nov 8, 2023, at 6:12 AM, Byungchul Park <byungchul@xxxxxx> wrote: > > > > !! External Email > > > > On Mon, Oct 30, 2023 at 09:51:30PM +0900, Byungchul Park wrote: > >>>> diff --git a/mm/memory.c b/mm/memory.c > >>>> index 6c264d2f969c..75dc48b6e15f 100644 > >>>> --- a/mm/memory.c > >>>> +++ b/mm/memory.c > >>>> @@ -3359,6 +3359,19 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) > >>>> if (vmf->page) > >>>> folio = page_folio(vmf->page); > >>>> > >>>> + /* > >>>> + * This folio has its read copy to prevent inconsistency while > >>>> + * deferring TLB flushes. However, the problem might arise if > >>>> + * it's going to become writable. > >>>> + * > >>>> + * To prevent it, give up the deferring TLB flushes and perform > >>>> + * TLB flush right away. > >>>> + */ > >>>> + if (folio && migrc_pending_folio(folio)) { > >>>> + migrc_unpend_folio(folio); > >>>> + migrc_try_flush_free_folios(NULL); > >>> > >>> So many potential function calls… Probably they should have been combined > >>> into one and at least migrc_pending_folio() should have been an inline > >>> function in the header. > >> > >> I will try to change it as you mention. > >> > >>>> + } > >>>> + > >>> > >>> What about mprotect? I thought David has changed it so it can set writable > >>> PTEs. > >> > >> I will check it out. > > > > I found mprotect stuff is already performing TLB flushes needed for it. > > So some redundant TLB flushes might happen by migrc but it's not that > > harmful I think. Thanks. > > Let me explain the scenario I am concerned with. Assume page P is RO, and > moves from Psrc to Pdst. Pointer “p” points to P. Initially (*p == 0). > > Let’s also assume we also have an atomic variable “a”. Initially (a == 0). > > I hope I got the migration function names right, but I hope the problem > itself can be clear regardless. > > CPU0 CPU1 CPU2 CPU3 > ---- ---- ---- ---- > (user-mode) (user-mode) > > Access *p > [Psrc cached in TLB] > > migrate_pages_batch() > -> migrate_folio_unmap() > > [ PTE updated, > still no flush ] > > mprotect(p, > RW) Here, mprotect() do_mprotect_pkey() tlb_finish_mmu() tlb_flush_mmu() I thought TLB flush for mprotect() is performed by tlb_flush_mmu() so any cached TLB entries on other CPUs can have chance to update. Could you correct me if I get it wrong? Thanks. Byungchul > > [ Psrc is > RW ] > > [ flush > deferred] > > > *p = 1 # Pdst > > xchg(&a, 1) > mfence > if (a == 1) > assert(*p == 1); > > > > Now at this point the assertion might fail. CPU2 wrote into Pdst, whereas > CPU1 reads from Psrc. But based on x86 memory model, userspace might not > expect this scenario to be possible, hence leading to bugs.