Re: Potential race in TLB flush batching?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 25, 2017 at 04:37:48PM +0900, Minchan Kim wrote:
> > Ok, as you say you have reproduced this with corruption, I would suggest
> > one path for dealing with it although you'll need to pass it by the
> > original authors.
> > 
> > When unmapping ranges, there is a check for dirty PTEs in
> > zap_pte_range() that forces a flush for dirty PTEs which aims to avoid
> > writable stale PTEs from CPU0 in a scenario like you laid out above.
> > 
> > madvise_free misses a similar class of check so I'm adding Minchan Kim
> > to the cc as the original author of much of that code. Minchan Kim will
> > need to confirm but it appears that two modifications would be required.
> > The first should pass in the mmu_gather structure to
> > madvise_free_pte_range (at minimum) and force flush the TLB under the
> > PTL if a dirty PTE is encountered. The second is that it should consider
> 
> OTL: I couldn't read this lengthy discussion so I miss miss something.
> 
> About MADV_FREE, I do not understand why it should flush TLB in MADV_FREE
> context. MADV_FREE's semantic allows "write(ie, dirty)" so if other thread
> in parallel which has stale pte does "store" to make the pte dirty,
> it's okay since try_to_unmap_one in shrink_page_list catches the dirty.
> 

In try_to_unmap_one it's fine. It's not necessarily fine in KSM. Given
that the key is that data corruption is avoided, you could argue with a
comment that madv_free doesn't necesssarily have to flush it as long as
KSM does even if it's clean due to batching.

> In above example, I think KSM should flush the TLB, not MADV_FREE and
> soft dirty page hander.
> 

That would also be acceptable.

> > flushing the full affected range as madvise_free holds mmap_sem for
> > read-only to avoid problems with two parallel madv_free operations. The
> > second is optional because there are other ways it could also be handled
> > that may have lower overhead.
> 
> Ditto. I cannot understand. Why does two parallel MADV_FREE have a problem?
> 

Like madvise(), madv_free can potentially return with a stale PTE visible
to the caller that observed a pte_none at the time of madv_free and uses
a stale PTE that potentially allows a lost write. It's debatable whether
this matters considering that madv_free to a region means that parallel
writers can lose their update anyway. It's less of a concern than the
KSM angle outlined in Nadav's example which he was able to artifically
reproduce by slowing operations to increase the race window.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux