Re: Potential race in TLB flush batching?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mel Gorman <mgorman@xxxxxxx> wrote:

> On Wed, Jul 19, 2017 at 03:19:00PM -0700, Nadav Amit wrote:
>>>> Yes, of course, since KSM does not batch TLB flushes. I regarded the other
>>>> direction - first try_to_unmap() removes the PTE (but still does not flush),
>>>> unlocks the page, and then KSM acquires the page lock and calls
>>>> write_protect_page(). It finds out the PTE is not present and does not flush
>>>> the TLB.
>>> 
>>> When KSM acquires the page lock, it then acquires the PTL where the
>>> cleared PTE is observed directly and skipped.
>> 
>> I don???t see why. Let???s try again - CPU0 reclaims while CPU1 deduplicates:
>> 
>> CPU0				CPU1
>> ----				----
>> shrink_page_list()
>> 
>> => try_to_unmap()
>> ==> try_to_unmap_one()
>> [ unmaps from some page-tables ]
>> 
>> [ try_to_unmap returns false;
>>  page not reclaimed ]
>> 
>> => keep_locked: unlock_page()
>> 
>> [ TLB flush deferred ]
>> 				try_to_merge_one_page()
>> 				=> trylock_page()
>> 				=> write_protect_page()
>> 				==> acquire ptl
>> 				  [ PTE non-present ???> no PTE change
>> 				    and no flush ]
>> 				==> release ptl
>> 				==> replace_page()
>> 
>> 
>> At this point, while replace_page() is running, CPU0 may still not have
>> flushed the TLBs. Another CPU (CPU2) may hold a stale PTE, which is not
>> write-protected. It can therefore write to that page while replace_page() is
>> running, resulting in memory corruption.
>> 
>> No?
> 
> KSM is not my strong point so it's reaching the point where others more
> familiar with that code need to be involved.

Do not assume for a second that I really know what is going on over there.

> If try_to_unmap returns false on CPU0 then at least one unmap attempt
> failed and the page is not reclaimed.

Actually, try_to_unmap() may even return true, and the page would still not
be reclaimed - for example if page_has_private() and freeing the buffers
fails. In this case, the page would be unlocked as well.

> For those that were unmapped, they
> will get flushed in the near future. When KSM operates on CPU1, it'll skip
> the unmapped pages under the PTL so stale TLB entries are not relevant as
> the mapped entries are still pointing to a valid page and ksm misses a merge
> opportunity.

This is the case I regarded, but I do not understand your point. The whole
problem is that CPU1 would skip the unmapped pages under the PTL. As it
skips them it does not flush them from the TLB. And as a result,
replace_page() may happen before the TLB is flushed by CPU0.

> If it write protects a page, ksm unconditionally flushes the PTE
> on clearing the PTE so again, there is no stale entry anywhere. For CPU2,
> it'll either reference a PTE that was unmapped in which case it'll fault
> once CPU0 flushes the TLB and until then it's safe to read and write as
> long as the TLB is flushed before the page is freed or IO is initiated which
> reclaim already handles.

In my scenario the page is not freed and there is no I/O in the reclaim
path. The TLB flush of CPU0 in my scenario is just deferred while the
page-table lock is not held. As I mentioned before, this time-period can be
potentially very long in a virtual machine. CPU2 referenced a PTE that
was unmapped by CPU0 (reclaim path) but not CPU1 (ksm path).

ksm, IIUC, would not expect modifications of the page during replace_page.
Eventually it would flush the TLB (after changing the PTE to point to the
deduplicated page). But in the meanwhile, another CPU may use stale PTEs for
writes, and those writes would be lost after the page is deduplicated.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux