> On Apr 1, 2021, at 1:38 AM, Mel Gorman <mgorman@xxxxxxx> wrote: > > On Wed, Mar 31, 2021 at 09:36:04AM -0700, Nadav Amit wrote: >> >> >>> On Mar 31, 2021, at 6:16 AM, Mel Gorman <mgorman@xxxxxxx> wrote: >>> >>> On Wed, Mar 31, 2021 at 07:20:09PM +0800, Huang, Ying wrote: >>>> Mel Gorman <mgorman@xxxxxxx> writes: >>>> >>>>> On Mon, Mar 29, 2021 at 02:26:51PM +0800, Huang Ying wrote: >>>>>> For NUMA balancing, in hint page fault handler, the faulting page will >>>>>> be migrated to the accessing node if necessary. During the migration, >>>>>> TLB will be shot down on all CPUs that the process has run on >>>>>> recently. Because in the hint page fault handler, the PTE will be >>>>>> made accessible before the migration is tried. The overhead of TLB >>>>>> shooting down is high, so it's better to be avoided if possible. In >>>>>> fact, if we delay mapping the page in PTE until migration, that can be >>>>>> avoided. This is what this patch doing. >>>>>> >>>>> >>>>> Why would the overhead be high? It was previously inaccessibly so it's >>>>> only parallel accesses making forward progress that trigger the need >>>>> for a flush. >>>> >>>> Sorry, I don't understand this. Although the page is inaccessible, the >>>> threads may access other pages, so TLB flushing is still necessary. >>>> >>> >>> You assert the overhead of TLB shootdown is high and yes, it can be >>> very high but you also said "the benchmark score has no visible changes" >>> indicating the TLB shootdown cost is not a major problem for the workload. >>> It does not mean we should ignore it though. >> >> If you are looking for a benchmark that is negatively affected by NUMA >> balancing, then IIRC Parsec???s dedup is such a workload. [1] >> > > Few questions; > > Is Parsec imparied due to NUMA balancing in general or due to TLB > shootdowns specifically? TLB shootdowns specifically. > > Are you using "gcc-pthreads" for parallelisation and the "native" size > for Parsec? native as it is the biggest workload, so it is most apparent with native. I don’t remember that I played with the threading model parameters. > > Is there any specific thread count that matters either in > absolute terms or as a precentage of online CPUs? IIRC, when thread count matches the CPU numbers (or perhaps slightly lower), the impact is the greatest.
Attachment:
signature.asc
Description: Message signed with OpenPGP