On Wed, Dec 01, 2010 at 02:56:44PM +0200, Avi Kivity wrote: > >> (a directed yield implementation would find that all vcpus are > >> runnable, yielding optimal results under this test case). > > > >I would think a plain yield() (rather than usleep/directed yield) would suffice > >here (yield would realize that there is nobody else to yield to and continue > >running the same vcpu thread). > > Currently yield() is a no-op on Linux. Hmm only when there is a single task in runqueue, otherwise yield will cause remaining threads to run before letting the yielding task to run again. > >As regards to any concern of leaking cpu > >bandwidth because of a plain yield, I think it can be fixed by a more > >simpler modification to yield that allows a thread to reclaim whatever timeslice > >it gave up previously [1]. > > If some other thread used that timeslice, don't we have an > accounting problem? Not if yield() remembers what timeslice was given up and adds that back when thread is finally ready to run. Figure below illustrates this idea: A0/4 C0/4 D0/4 A0/4 C0/4 D0/4 A0/4 C0/4 D0/4 A0/4 p0 |----|-L|----|----|----|L|----|----|----|L|----|----|----|--------------| \ \ \ \ B0/2[2] B0/0[6] B0/0[10] B0/14[0] where, p0 -> physical cpu0 L -> denotes period of lock contention A0/4 -> means vcpu A0 (of guest A) ran for 4 ms B0/2[6] -> means vcpu B0 (of guest B) ran for 2 ms (and has given up 6ms worth of its timeslice so far). In reality, we should not see too much of "given up" timeslice for a vcpu. > >Regarding directed yield, do we have any reliable mechanism to find target of > >directed yield in this (unmodified/non-paravirtualized guest) case? IOW how do > >we determine the vcpu thread to which cycles need to be yielded upon contention? > > My idea was to yield to a random starved vcpu of the same guest. > There are several cases to consider: > > - we hit the right vcpu; lock is released, party. > - we hit some vcpu that is doing unrelated work. yielding thread > doesn't make progress, but we're not wasting cpu time. > - we hit another waiter for the same lock. it will also PLE exit > and trigger a directed yield. this increases the cost of directed > yield by a factor of count_of_runnable_but_not_running_vcpus, which > could be large, but not disasterously so (i.e. don't run a 64-vcpu > guest on a uniprocessor host with this) > > >> So if you were to test something similar running with a 20% vcpu > >> cap, I'm sure you'd run into similar issues. It may show with fewer > >> vcpus (I've only tested 64). > >> > >> >Are you assuming the existence of a directed yield and the > >> >specific concern is what happens when a directed yield happens > >> >after a PLE and the target of the yield has been capped? > >> > >> Yes. My concern is that we will see the same kind of problems > >> directed yield was designed to fix, but without allowing directed > >> yield to fix them. Directed yield was designed to fix lock holder > >> preemption under contention, > > > >For modified guests, something like [2] seems to be the best approach to fix > >lock-holder preemption (LHP) problem, which does not require any sort of > >directed yield support. Essentially upon contention, a vcpu registers its lock > >of interest and goes to sleep (via hypercall) waiting for lock-owner to wake it > >up (again via another hypercall). > > Right. We don't have these hypercalls for KVM atm, which I am working on now. > >For unmodified guests, IMHO a plain yield (or slightly enhanced yield [1]) > >should fix the LHP problem. > > A plain yield (ignoring no-opiness on Linux) will penalize the > running guest wrt other guests. We need to maintain fairness. Agreed on the need to maintain fairness. > >Fyi, Xen folks also seem to be avoiding a directed yield for some of the same > >reasons [3]. > > I think that fails for unmodified guests, where you don't know when > the lock is released and so you don't have a wake_up notification. > You lost a large timeslice and you can't gain it back, whereas with > pv the wakeup means you only lose as much time as the lock was held. > > >Given this line of thinking, hard-limiting guests (either in user-space or > >kernel-space, latter being what I prefer) should not have adverse interactions > >with LHP-related solutions. > > If you hard-limit a vcpu that holds a lock, any waiting vcpus are > also halted. This can happen in normal case when lock-holders are preempted as well. So not a new problem that hard-limits is introducing! > With directed yield you can let the lock holder make > some progress at the expense of another vcpu. A regular yield() > will simply stall the waiter. Agreed. Do you see any problems with slightly enhanced version of yeild described above (rather than directed yield)? It has some advantage over directed yield in that it preserves not only fairness between VMs but also fairness between VCPUs of a VM. Also it avoids the need for a guessing game mentioned above and bad interactions with hard-limits. CCing other scheduler experts for their opinion of proposed yield() extensions. - vatsa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html