On Wed, Dec 01, 2010 at 05:25:18PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-01 at 21:42 +0530, Srivatsa Vaddagiri wrote: > > > Not if yield() remembers what timeslice was given up and adds that back when > > thread is finally ready to run. Figure below illustrates this idea: > > > > > > A0/4 C0/4 D0/4 A0/4 C0/4 D0/4 A0/4 C0/4 D0/4 A0/4 > > p0 |----|-L|----|----|----|L|----|----|----|L|----|----|----|--------------| > > \ \ \ \ > > B0/2[2] B0/0[6] B0/0[10] B0/14[0] > > > > > > where, > > p0 -> physical cpu0 > > L -> denotes period of lock contention > > A0/4 -> means vcpu A0 (of guest A) ran for 4 ms > > B0/2[6] -> means vcpu B0 (of guest B) ran for 2 ms (and has given up > > 6ms worth of its timeslice so far). In reality, we should > > not see too much of "given up" timeslice for a vcpu. > > /me fails to parse Maybe ASCII art didnt get displayed well by your email reader? Essentially what I am suggesting above is some extension to yield as below: yield_task_fair(...) { + ideal_runtime = sched_slice(cfs_rq, curr); + delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime; + rem_time_slice = ideal_runtime - delta_exec; + + current->donate_time += rem_time_slice > some_threshold ? + some_threshold : rem_time_slice; ... } sched_slice(...) { slice = ... + slice += current->donate_time; } or something close to it. I am bit reluctant to go that route myself, unless the fairness issue with plain yield is quite bad. > > > A plain yield (ignoring no-opiness on Linux) will penalize the > > > running guest wrt other guests. We need to maintain fairness. Avi, any idea how much penalty are we talking of here in using plain yield? If that is acceptable in practice, I'd prefer we use the plain yield rather than add any more sophistication to it .. > > Agreed on the need to maintain fairness. > > Directed yield and fairness don't mix well either. You can end up > feeding the other tasks more time than you'll ever get back. Agreed, that's the reason I was suggesting a different form of yield which addresses the fairness between VCPUs as well. > > This can happen in normal case when lock-holders are preempted as well. So > > not a new problem that hard-limits is introducing! > > No, but hard limits make it _much_ worse. Sure .. > > > With directed yield you can let the lock holder make > > > some progress at the expense of another vcpu. A regular yield() > > > will simply stall the waiter. > > > > Agreed. Do you see any problems with slightly enhanced version of yeild > > described above (rather than directed yield)? It has some advantage over > > directed yield in that it preserves not only fairness between VMs but also > > fairness between VCPUs of a VM. Also it avoids the need for a guessing game > > mentioned above and bad interactions with hard-limits. > > > > CCing other scheduler experts for their opinion of proposed yield() extensions. > > sys_yield() usage for anything other but two FIFO threads of the same > priority goes to /dev/null. > > The Xen paravirt spinlock solution is relatively sane, use that. > Unmodified guests suck anyway, there's really nothing much sane you can > do there as you don't know who owns what lock. Hopefully we don't have to deal with unmodified guests for too long. Till that time, plain yield() upon lock-contention seems like the best option we have for such unmodified guests. - vatsa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html