Re: [RFC 20/20] mm/rmap: avoid potential races

"Huang, Ying" <ying.huang@xxxxxxxxx> · Tue, 24 Aug 2021 08:36:18 +0800

Nadav Amit <namit@xxxxxxxxxx> writes:

>> On Aug 23, 2021, at 1:05 AM, Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>> 
>> Hi, Nadav,
>> 
>> Nadav Amit <nadav.amit@xxxxxxxxx> writes:
>> 
>>> From: Nadav Amit <namit@xxxxxxxxxx>
>>> 
>>> flush_tlb_batched_pending() appears to have a theoretical race:
>>> tlb_flush_batched is being cleared after the TLB flush, and if in
>>> between another core calls set_tlb_ubc_flush_pending() and sets the
>>> pending TLB flush indication, this indication might be lost. Holding the
>>> page-table lock when SPLIT_LOCK is set cannot eliminate this race.
>> 
>> Recently, when I read the corresponding code, I find the exact same race
>> too.  Do you still think the race is possible at least in theory?  If
>> so, why hasn't your fix been merged?
>
> I think the race is possible. It didn’t get merged, IIRC, due to some
> addressable criticism and lack of enthusiasm from other people, and
> my laziness/busy-ness.

Got it!  Thanks your information!

>>> The current batched TLB invalidation scheme therefore does not seem
>>> viable or easily repairable.
>> 
>> I have some idea to fix this without too much code.  If necessary, I
>> will send it out.
>
> Arguably, it would be preferable to have a small back-portable fix for
> this issue specifically. Just try to ensure that you do not introduce
> performance overheads. Any solution should be clear about its impact
> on additional TLB flushes on the worst-case scenario and the number
> of additional atomic operations that would be required.

Sure.  Will do that.

Best Regards,
Huang, Ying