* Peter Zijlstra (a.p.zijlstra@xxxxxxxxx) wrote: > On Wed, 2010-12-01 at 21:42 +0530, Srivatsa Vaddagiri wrote: > > > Not if yield() remembers what timeslice was given up and adds that back when > > thread is finally ready to run. Figure below illustrates this idea: > > > > > > A0/4 C0/4 D0/4 A0/4 C0/4 D0/4 A0/4 C0/4 D0/4 A0/4 > > p0 |----|-L|----|----|----|L|----|----|----|L|----|----|----|--------------| > > \ \ \ \ > > B0/2[2] B0/0[6] B0/0[10] B0/14[0] > > > > > > where, > > p0 -> physical cpu0 > > L -> denotes period of lock contention > > A0/4 -> means vcpu A0 (of guest A) ran for 4 ms > > B0/2[6] -> means vcpu B0 (of guest B) ran for 2 ms (and has given up > > 6ms worth of its timeslice so far). In reality, we should > > not see too much of "given up" timeslice for a vcpu. > > /me fails to parse > > > > >Regarding directed yield, do we have any reliable mechanism to find target of > > > >directed yield in this (unmodified/non-paravirtualized guest) case? IOW how do > > > >we determine the vcpu thread to which cycles need to be yielded upon contention? > > > > > > My idea was to yield to a random starved vcpu of the same guest. > > > There are several cases to consider: > > > > > > - we hit the right vcpu; lock is released, party. > > > - we hit some vcpu that is doing unrelated work. yielding thread > > > doesn't make progress, but we're not wasting cpu time. > > > - we hit another waiter for the same lock. it will also PLE exit > > > and trigger a directed yield. this increases the cost of directed > > > yield by a factor of count_of_runnable_but_not_running_vcpus, which > > > could be large, but not disasterously so (i.e. don't run a 64-vcpu > > > guest on a uniprocessor host with this) > > > > > > >> So if you were to test something similar running with a 20% vcpu > > > >> cap, I'm sure you'd run into similar issues. It may show with fewer > > > >> vcpus (I've only tested 64). > > > >> > > > >> >Are you assuming the existence of a directed yield and the > > > >> >specific concern is what happens when a directed yield happens > > > >> >after a PLE and the target of the yield has been capped? > > > >> > > > >> Yes. My concern is that we will see the same kind of problems > > > >> directed yield was designed to fix, but without allowing directed > > > >> yield to fix them. Directed yield was designed to fix lock holder > > > >> preemption under contention, > > > > > > > >For modified guests, something like [2] seems to be the best approach to fix > > > >lock-holder preemption (LHP) problem, which does not require any sort of > > > >directed yield support. Essentially upon contention, a vcpu registers its lock > > > >of interest and goes to sleep (via hypercall) waiting for lock-owner to wake it > > > >up (again via another hypercall). > > > > > > Right. > > > > We don't have these hypercalls for KVM atm, which I am working on now. > > > > > >For unmodified guests, IMHO a plain yield (or slightly enhanced yield [1]) > > > >should fix the LHP problem. > > > > > > A plain yield (ignoring no-opiness on Linux) will penalize the > > > running guest wrt other guests. We need to maintain fairness. > > > > Agreed on the need to maintain fairness. > > Directed yield and fairness don't mix well either. You can end up > feeding the other tasks more time than you'll ever get back. If the directed yield is always to another task in your cgroup then inter-guest scheduling fairness should be maintained. > > > >Fyi, Xen folks also seem to be avoiding a directed yield for some of the same > > > >reasons [3]. > > > > > > I think that fails for unmodified guests, where you don't know when > > > the lock is released and so you don't have a wake_up notification. > > > You lost a large timeslice and you can't gain it back, whereas with > > > pv the wakeup means you only lose as much time as the lock was held. > > > > > > >Given this line of thinking, hard-limiting guests (either in user-space or > > > >kernel-space, latter being what I prefer) should not have adverse interactions > > > >with LHP-related solutions. > > > > > > If you hard-limit a vcpu that holds a lock, any waiting vcpus are > > > also halted. > > > > This can happen in normal case when lock-holders are preempted as well. So > > not a new problem that hard-limits is introducing! > > No, but hard limits make it _much_ worse. > > > > With directed yield you can let the lock holder make > > > some progress at the expense of another vcpu. A regular yield() > > > will simply stall the waiter. > > > > Agreed. Do you see any problems with slightly enhanced version of yeild > > described above (rather than directed yield)? It has some advantage over > > directed yield in that it preserves not only fairness between VMs but also > > fairness between VCPUs of a VM. Also it avoids the need for a guessing game > > mentioned above and bad interactions with hard-limits. > > > > CCing other scheduler experts for their opinion of proposed yield() extensions. > > sys_yield() usage for anything other but two FIFO threads of the same > priority goes to /dev/null. > > The Xen paravirt spinlock solution is relatively sane, use that. > Unmodified guests suck anyway, there's really nothing much sane you can > do there as you don't know who owns what lock. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html