On Sun, 2012-09-16 at 11:55 +0300, Avi Kivity wrote: > On 09/14/2012 12:30 AM, Andrew Theurer wrote: > > > The concern I have is that even though we have gone through changes to > > help reduce the candidate vcpus we yield to, we still have a very poor > > idea of which vcpu really needs to run. The result is high cpu usage in > > the get_pid_task and still some contention in the double runqueue lock. > > To make this scalable, we either need to significantly reduce the > > occurrence of the lock-holder preemption, or do a much better job of > > knowing which vcpu needs to run (and not unnecessarily yielding to vcpus > > which do not need to run). > > > > On reducing the occurrence: The worst case for lock-holder preemption > > is having vcpus of same VM on the same runqueue. This guarantees the > > situation of 1 vcpu running while another [of the same VM] is not. To > > prove the point, I ran the same test, but with vcpus restricted to a > > range of host cpus, such that any single VM's vcpus can never be on the > > same runqueue. In this case, all 10 VMs' vcpu-0's are on host cpus 0-4, > > vcpu-1's are on host cpus 5-9, and so on. Here is the result: > > > > kvm_cpu_spin, and all > > yield_to changes, plus > > restricted vcpu placement: 8823 +/- 3.20% much, much better > > > > On picking a better vcpu to yield to: I really hesitate to rely on > > paravirt hint [telling us which vcpu is holding a lock], but I am not > > sure how else to reduce the candidate vcpus to yield to. I suspect we > > are yielding to way more vcpus than are prempted lock-holders, and that > > IMO is just work accomplishing nothing. Trying to think of way to > > further reduce candidate vcpus.... > > I wouldn't say that yielding to the "wrong" vcpu accomplishes nothing. > That other vcpu gets work done (unless it is in pause loop itself) and > the yielding vcpu gets put to sleep for a while, so it doesn't spend > cycles spinning. While we haven't fixed the problem at least the guest > is accomplishing work, and meanwhile the real lock holder may get > naturally scheduled and clear the lock. OK, yes, if the other thread gets useful work done, then it is not wasteful. I was thinking of the worst case scenario, where any other vcpu would likely spin as well, and the host side cpu-time for switching vcpu threads was not all that productive. Well, I suppose it does help eliminate potential lock holding vcpus; it just seems to be not that efficient or fast enough. > The main problem with this theory is that the experiments don't seem to > bear it out. Granted, my test case is quite brutal. It's nothing but over-committed VMs which always have some spin lock activity. However, we really should try to fix the worst case scenario. > So maybe one of the assumptions is wrong - the yielding > vcpu gets scheduled early. That could be the case if the two vcpus are > on different runqueues - you could be changing the relative priority of > vcpus on the target runqueue, but still remain on top yourself. Is this > possible with the current code? > > Maybe we should prefer vcpus on the same runqueue as yield_to targets, > and only fall back to remote vcpus when we see it didn't help. > > Let's examine a few cases: > > 1. spinner on cpu 0, lock holder on cpu 0 > > win! > > 2. spinner on cpu 0, random vcpu(s) (or normal processes) on cpu 0 > > Spinner gets put to sleep, random vcpus get to work, low lock contention > (no double_rq_lock), by the time spinner gets scheduled we might have won > > 3. spinner on cpu 0, another spinner on cpu 0 > > Worst case, we'll just spin some more. Need to detect this case and > migrate something in. Well, we can certainly experiment and see what we get. IMO, the key to getting this working really well on the large VMs is finding the lock-holding cpu -quickly-. What I think is happening is that we go through a relatively long process to get to that one right vcpu. I guess I need to find a faster way to get there. > 4. spinner on cpu 0, alone > > Similar > > > It seems we need to tie in to the load balancer. > > Would changing the priority of the task while it is spinning help the > load balancer? Not sure. -Andrew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html