On Fri, Nov 10, 2023 at 10:02:01AM +0900, Byungchul Park wrote: > On Thu, Nov 09, 2023 at 10:16:57AM +0000, Nadav Amit wrote: > > > > > > > On Nov 8, 2023, at 6:12 AM, Byungchul Park <byungchul@xxxxxx> wrote: > > > > > > !! External Email > > > > > > On Mon, Oct 30, 2023 at 09:51:30PM +0900, Byungchul Park wrote: > > >>>> diff --git a/mm/memory.c b/mm/memory.c > > >>>> index 6c264d2f969c..75dc48b6e15f 100644 > > >>>> --- a/mm/memory.c > > >>>> +++ b/mm/memory.c > > >>>> @@ -3359,6 +3359,19 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) > > >>>> if (vmf->page) > > >>>> folio = page_folio(vmf->page); > > >>>> > > >>>> + /* > > >>>> + * This folio has its read copy to prevent inconsistency while > > >>>> + * deferring TLB flushes. However, the problem might arise if > > >>>> + * it's going to become writable. > > >>>> + * > > >>>> + * To prevent it, give up the deferring TLB flushes and perform > > >>>> + * TLB flush right away. > > >>>> + */ > > >>>> + if (folio && migrc_pending_folio(folio)) { > > >>>> + migrc_unpend_folio(folio); > > >>>> + migrc_try_flush_free_folios(NULL); > > >>> > > >>> So many potential function calls… Probably they should have been combined > > >>> into one and at least migrc_pending_folio() should have been an inline > > >>> function in the header. > > >> > > >> I will try to change it as you mention. > > >> > > >>>> + } > > >>>> + > > >>> > > >>> What about mprotect? I thought David has changed it so it can set writable > > >>> PTEs. > > >> > > >> I will check it out. > > > > > > I found mprotect stuff is already performing TLB flushes needed for it. > > > So some redundant TLB flushes might happen by migrc but it's not that > > > harmful I think. Thanks. > > > > Let me explain the scenario I am concerned with. Assume page P is RO, and > > moves from Psrc to Pdst. Pointer “p” points to P. Initially (*p == 0). > > > > Let’s also assume we also have an atomic variable “a”. Initially (a == 0). > > > > I hope I got the migration function names right, but I hope the problem > > itself can be clear regardless. > > > > CPU0 CPU1 CPU2 CPU3 > > ---- ---- ---- ---- > > (user-mode) (user-mode) > > > > Access *p > > [Psrc cached in TLB] > > > > migrate_pages_batch() > > -> migrate_folio_unmap() > > > > [ PTE updated, > > still no flush ] > > > > mprotect(p, > > RW) > > Here, > > mprotect() > do_mprotect_pkey() > tlb_finish_mmu() > tlb_flush_mmu() > > I thought TLB flush for mprotect() is performed by tlb_flush_mmu() so > any cached TLB entries on other CPUs can have chance to update. Could > you correct me if I get it wrong? Thanks. I guess you tried to inform me that x86 mmu automatically keeps the consistancy based on cached TLB entries. Right? If yes, I should do something on that path. If not, it's not problematic. Thoughts? Byungchul > Byungchul > > > > > [ Psrc is > > RW ] > > > > [ flush > > deferred] > > > > > > *p = 1 # Pdst > > > > xchg(&a, 1) > > mfence > > if (a == 1) > > assert(*p == 1); > > > > > > > > Now at this point the assertion might fail. CPU2 wrote into Pdst, whereas > > CPU1 reads from Psrc. But based on x86 memory model, userspace might not > > expect this scenario to be possible, hence leading to bugs.