Re: [RFC 20/20] mm/rmap: avoid potential races

Nadav Amit <namit@xxxxxxxxxx> · Mon, 23 Aug 2021 15:50:22 +0000

> On Aug 23, 2021, at 1:05 AM, Huang, Ying <ying.huang@xxxxxxxxx> wrote:
> 
> Hi, Nadav,
> 
> Nadav Amit <nadav.amit@xxxxxxxxx> writes:
> 
>> From: Nadav Amit <namit@xxxxxxxxxx>
>> 
>> flush_tlb_batched_pending() appears to have a theoretical race:
>> tlb_flush_batched is being cleared after the TLB flush, and if in
>> between another core calls set_tlb_ubc_flush_pending() and sets the
>> pending TLB flush indication, this indication might be lost. Holding the
>> page-table lock when SPLIT_LOCK is set cannot eliminate this race.
> 
> Recently, when I read the corresponding code, I find the exact same race
> too.  Do you still think the race is possible at least in theory?  If
> so, why hasn't your fix been merged?

I think the race is possible. It didn’t get merged, IIRC, due to some
addressable criticism and lack of enthusiasm from other people, and
my laziness/busy-ness.

> 
>> The current batched TLB invalidation scheme therefore does not seem
>> viable or easily repairable.
> 
> I have some idea to fix this without too much code.  If necessary, I
> will send it out.

Arguably, it would be preferable to have a small back-portable fix for
this issue specifically. Just try to ensure that you do not introduce
performance overheads. Any solution should be clear about its impact
on additional TLB flushes on the worst-case scenario and the number
of additional atomic operations that would be required.