On Fri, Jul 14, 2017 at 05:00:41PM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2017-07-11 at 15:07 -0700, Andy Lutomirski wrote: > > On Tue, Jul 11, 2017 at 12:18 PM, Mel Gorman <mgorman@xxxxxxx> wrote: > > > > I would change this slightly: > > > > > +void flush_tlb_batched_pending(struct mm_struct *mm) > > > +{ > > > + if (mm->tlb_flush_batched) { > > > + flush_tlb_mm(mm); > > > > How about making this a new helper arch_tlbbatch_flush_one_mm(mm); > > The idea is that this could be implemented as flush_tlb_mm(mm), but > > the actual semantics needed are weaker. All that's really needed > > AFAICS is to make sure that any arch_tlbbatch_add_mm() calls on this > > mm that have already happened become effective by the time that > > arch_tlbbatch_flush_one_mm() returns. > > Jumping in ... I just discovered that 'new' batching stuff... is it > documented anywhere ? > This should be a new thread. The original commit log has many of the details and the comments have others. It's clearer what the boundaries are and what is needed from an architecture with Andy's work on top which right now is easier to see from tip/x86/mm > We already had some form of batching via the mmu_gather, now there's a > different somewhat orthogonal and it's completely unclear what it's > about and why we couldn't use what we already had. Also what > assumptions it makes if I want to port it to my arch.... > The batching in this context is more about mm's than individual pages and was done this was as the number of mm's to track was potentially unbound. At the time of implementation, tracking individual pages and the extra bits for mmu_gather was overkill and fairly complex due to the need to potentiall restart when the gather structure filled. It may also be only a gain on a limited number of architectures depending on exactly how an architecture handles flushing. At the time, batching this for x86 in the worse-case scenario where all pages being reclaimed were mapped from multiple threads knocked 24.4% off elapsed run time and 29% off system CPU but only on multi-socket NUMA machines. On UMA, it was barely noticable. For some workloads where only a few pages are mapped or the mapped pages on the LRU are relatively sparese, it'll make no difference. The worst-case situation is extremely IPI intensive on x86 where many IPIs were being sent for each unmap. It's only worth even considering if you see that the time spent sending IPIs for flushes is a large portion of reclaim. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>