Re: Potential race in TLB flush batching?

Andy Lutomirski <luto@xxxxxxxxxx> · Fri, 14 Jul 2017 15:21:03 -0700

On Fri, Jul 14, 2017 at 2:27 AM, Mel Gorman <mgorman@xxxxxxx> wrote:
> On Fri, Jul 14, 2017 at 07:02:57PM +1000, Benjamin Herrenschmidt wrote:
>> On Fri, 2017-07-14 at 09:31 +0100, Mel Gorman wrote:
>> > It may also be only a gain on a limited number of architectures depending
>> > on exactly how an architecture handles flushing. At the time, batching
>> > this for x86 in the worse-case scenario where all pages being reclaimed
>> > were mapped from multiple threads knocked 24.4% off elapsed run time and
>> > 29% off system CPU but only on multi-socket NUMA machines. On UMA, it was
>> > barely noticable. For some workloads where only a few pages are mapped or
>> > the mapped pages on the LRU are relatively sparese, it'll make no difference.
>> >
>> > The worst-case situation is extremely IPI intensive on x86 where many
>> > IPIs were being sent for each unmap. It's only worth even considering if
>> > you see that the time spent sending IPIs for flushes is a large portion
>> > of reclaim.
>>
>> Ok, it would be interesting to see how that compares to powerpc with
>> its HW tlb invalidation broadcasts. We tend to hate them and prefer
>> IPIs in most cases but maybe not *this* case .. (mostly we find that
>> IPI + local inval is better for large scale invals, such as full mm on
>> exit/fork etc...).
>>
>> In the meantime I found the original commits, we'll dig and see if it's
>> useful for us.
>>
>
> I would suggest that it is based on top of Andy's work that is currently in
> Linus' tree for 4.13-rc1 as the core/arch boundary is a lot clearer. While
> there is other work pending on top related to mm and generation counters,
> that is primarily important for addressing the race which ppc64 may not
> need if you always flush to clear the accessed bit (or equivalent). The
> main thing to watch for is that if an accessed or young bit is being set
> for the first time that the arch check the underlying PTE and trap if it's
> invalid. If that holds and there is a flush when the young bit is cleared
> then you probably do not need the arch hook that closes the race.
>

Ben, if you could read the API in tip:x86/mm + Mel's patch, it would
be fantastic.  I'd like to know whether a non-x86 non-mm person can
understand the API (arch_tlbbatch_add_mm, arch_tlbbatch_flush, and
arch_tlbbatch_flush_one_mm) well enough to implement it.  I'd also
like to know for real that it makes sense outside of x86.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>