Minchan Kim <minchan@xxxxxxxxxx> wrote: > On Sun, Aug 13, 2017 at 02:50:19PM +0200, Peter Zijlstra wrote: >> On Sun, Aug 13, 2017 at 06:06:32AM +0000, Nadav Amit wrote: >>>> however mm_tlb_flush_nested() is a mystery, it appears to care about >>>> anything inside the range. For now rely on it doing at least _a_ PTL >>>> lock instead of taking _the_ PTL lock. >>> >>> It does not care about “anything” inside the range, but only on situations >>> in which there is at least one (same) PT that was modified by one core and >>> then read by the other. So, yes, it will always be _the_ same PTL, and not >>> _a_ PTL - in the cases that flush is really needed. >>> >>> The issue that might require additional barriers is that >>> inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is >>> not held. IIUC, since the release-acquire might not behave as a full memory >>> barrier, this requires an explicit memory barrier. >> >> So I'm not entirely clear about this yet. >> >> How about: >> >> >> CPU0 CPU1 >> >> tlb_gather_mmu() >> >> lock PTLn >> no mod >> unlock PTLn >> >> tlb_gather_mmu() >> >> lock PTLm >> mod >> include in tlb range >> unlock PTLm >> >> lock PTLn >> mod >> unlock PTLn >> >> tlb_finish_mmu() >> force = mm_tlb_flush_nested(tlb->mm); >> arch_tlb_finish_mmu(force); >> >> >> ... more ... >> >> tlb_finish_mmu() >> >> >> >> In this case you also want CPU1's mm_tlb_flush_nested() call to return >> true, right? > > No, because CPU 1 mofified pte and added it into tlb range > so regardless of nested, it will flush TLB so there is no stale > TLB problem. > >> But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu() >> you're not guaranteed CPU1 sees the increment. The only way to do that >> is to make the PTL locks RCsc and that is a much more expensive >> proposition. >> >> >> What about: >> >> >> CPU0 CPU1 >> >> tlb_gather_mmu() >> >> lock PTLn >> no mod >> unlock PTLn >> >> >> lock PTLm >> mod >> include in tlb range >> unlock PTLm >> >> tlb_gather_mmu() >> >> lock PTLn >> mod >> unlock PTLn >> >> tlb_finish_mmu() >> force = mm_tlb_flush_nested(tlb->mm); >> arch_tlb_finish_mmu(force); >> >> >> ... more ... >> >> tlb_finish_mmu() >> >> Do we want CPU1 to see it here? If so, where does it end? > > Ditto. Since CPU 1 has added range, it will flush TLB regardless > of nested condition. > >> CPU0 CPU1 >> >> tlb_gather_mmu() >> >> lock PTLn >> no mod >> unlock PTLn >> >> >> lock PTLm >> mod >> include in tlb range >> unlock PTLm >> >> tlb_finish_mmu() >> force = mm_tlb_flush_nested(tlb->mm); >> >> tlb_gather_mmu() >> >> lock PTLn >> mod >> unlock PTLn >> >> arch_tlb_finish_mmu(force); >> >> >> ... more ... >> >> tlb_finish_mmu() >> >> >> This? >> >> >> Could you clarify under what exact condition mm_tlb_flush_nested() must >> return true? > > mm_tlb_flush_nested aims for the CPU side where there is no pte update > but need TLB flush. > As I wrote https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dmm-26m-3D150267398226529-26w-3D2&d=DwIDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=x9zhXCtCLvTDtvE65-BGSA&m=v2Z7eDi7z1H9zdngcjZvlNeBudWzA9KvcXFNpU2A77s&s=amaSu_gurmBHHPcl3Pxfdl0Tk_uTnmf60tMQAsNDHVU&e= , > it has stable TLB problem if we don't flush TLB although there is no > pte modification. To clarify: the main problem that these patches address is when the first CPU updates the PTE, and second CPU sees the updated value and thinks: “the PTE is already what I wanted - no flush is needed”. For some reason (I would assume intentional), all the examples here first “do not modify” the PTE, and then modify it - which is not an “interesting” case. However, based on what I understand on the memory barriers, I think there is indeed a missing barrier before reading it in mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case, before reading, would solve the problem with least impact on systems with strong memory ordering. Minchan, as for the solution you proposed, it seems to open again a race, since the “pending” indication is removed before the actual TLB flush is performed. Nadav��.n��������+%������w��{.n�����{��w����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f