Re: [RFC][PATCH 14/26] sched, numa: Numa balancer

Don Morris <don.morris@xxxxxx> · Fri, 13 Jul 2012 07:45:18 -0700

On 07/12/2012 03:02 PM, Rik van Riel wrote:
> On 03/16/2012 10:40 AM, Peter Zijlstra wrote:
> 
> At LSF/MM, there was a presentation comparing Peter's
> NUMA code with Andrea's NUMA code. I believe this is
> the main reason why Andrea's code performed better in
> that particular test...
> 
>> +        if (sched_feat(NUMA_BALANCE_FILTER)) {
>> +            /*
>> +             * Avoid moving ne's when we create a larger imbalance
>> +             * on the other end.
>> +             */
>> +            if ((imb->type & NUMA_BALANCE_CPU) &&
>> +                imb->cpu - cpu_moved < ne_cpu / 2)
>> +                goto next;
>> +
>> +            /*
>> +             * Avoid migrating ne's when we'll know we'll push our
>> +             * node over the memory limit.
>> +             */
>> +            if (max_mem_load &&
>> +                imb->mem_load + mem_moved + ne_mem > max_mem_load)
>> +                goto next;
>> +        }
> 
> IIRC the test consisted of a 16GB NUMA system with two 8GB nodes.
> It was running 3 KVM guests, two guests of 3GB memory each, and
> one guest of 6GB each.

How many cpus per guest (host threads) and how many physical/logical
cpus per node on the host? Any comparisons with a situation where
the memory would fit within nodes but the scheduling load would
be too high?

Don

> 
> With autonuma, the 6GB guest ended up on one node, and the
> 3GB guests on the other.
> 
> With sched numa, each node had a 3GB guest, and part of the 6GB guest.
> 
> There is a fundamental difference in the balancing between autonuma
> and sched numa.
> 
> In sched numa, a process is moved over to the current node only if
> the current node has space for it.
> 
> Autonuma, on the other hand, operates more of a a "hostage exchange"
> policy, where a thread on one node is exchanged with a thread on
> another node, if it looks like that will reduce the overall number
> of cross-node NUMA faults in the system.
> 
> I am not sure how to do a "hostage exchange" algorithm with
> sched numa, but it would seem like it could be necessary in order
> for some workloads to converge on a sane configuration.
> 
> After all, with only about 2GB free on each node, you will never
> get to move either a 3GB guest, or parts of a 6GB guest...
> 
> Any ideas?
> 
> -- 
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> .
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>