Hi Mel, On Wed, Jul 26, 2017 at 10:22:28AM +0100, Mel Gorman wrote: > On Wed, Jul 26, 2017 at 02:43:06PM +0900, Minchan Kim wrote: > > > I'm relying on the fact you are the madv_free author to determine if > > > it's really necessary. The race in question is CPU 0 running madv_free > > > and updating some PTEs while CPU 1 is also running madv_free and looking > > > at the same PTEs. CPU 1 may have writable TLB entries for a page but fail > > > the pte_dirty check (because CPU 0 has updated it already) and potentially > > > fail to flush. Hence, when madv_free on CPU 1 returns, there are still > > > potentially writable TLB entries and the underlying PTE is still present > > > so that a subsequent write does not necessarily propagate the dirty bit > > > to the underlying PTE any more. Reclaim at some unknown time at the future > > > may then see that the PTE is still clean and discard the page even though > > > a write has happened in the meantime. I think this is possible but I could > > > have missed some protection in madv_free that prevents it happening. > > > > Thanks for the detail. You didn't miss anything. It can happen and then > > it's really bug. IOW, if application does write something after madv_free, > > it must see the written value, not zero. > > > > How about adding [set|clear]_tlb_flush_pending in tlb batchin interface? > > With it, when tlb_finish_mmu is called, we can know we skip the flush > > but there is pending flush, so flush focefully to avoid madv_dontneed > > as well as madv_free scenario. > > > > I *think* this is ok as it's simply more expensive on the KSM side in > the event of a race but no other harmful change is made assuming that > KSM is the only race-prone. The check for mm_tlb_flush_pending also > happens under the PTL so there should be sufficient protection from the > mm struct update being visible at teh right time. > > Check using the test program from "mm: Always flush VMA ranges affected > by zap_page_range v2" if it handles the madvise case as well as that > would give some degree of safety. Make sure it's tested against 4.13-rc2 > instead of mmotm which already includes the madv_dontneed fix. If yours > works for both then it supersedes the mmotm patch. Okay, I will test it on 4.13-rc2 + Nadav's atomic tlb_flush_pending + my patch fixed partial flush problem pointed out by Nadav. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>