Re: Role of qemu_fair_mutex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/04/2011 05:43 PM, Anthony Liguori wrote:
The fact that the iothread drops the global lock during sleep is a detail that shouldn't affect correctness. The IO thread is absolutely allowed to run for arbitrary periods of time without dropping the qemu mutex.

No, it's not, since it will stop vcpus in their tracks. Whenever we hold qemu_mutex for unbounded time, that's a bug.


I'm not sure that designing the io thread to hold the lock for a "bounded" amount of time is a good design point. What is an accepted amount of time for it to hold the lock?

Ultimately, zero. It's ridiculous to talk about 64-vcpu guests or multiqueue virtio on one hand, and have everything serialize on a global lock on the other hand.

A reasonable amount of time would be (heavyweight_vmexit_time / nr_vcpu), which would ensure that the lock never dominates performance. I don't think it's achievable, probably the time to bounce the lock's cache line exceeds this.

I'd be happy with "a few microseconds" for now.

Instead of the VCPU relying on the IO thread to eventually drop the lock, it seems far superior to have the VCPU thread indicate to the IO thread that it needs the lock.

I don't see why. First, the iothread is not the lock hog, tcg is. Second, you can't usually break out of iothread tasks (unlike tcg).

As of right now, the IO thread can indicate to the VCPU thread that it needs the lock so having a symmetric interface seems obvious. Of course, you need to pick one to have more priority in case both indicate they need to use the lock at the same exact time.

io and tcg are not symmetric. If you let io have the higher priority, all io will complete and the iothread will go back to sleep. If you let tcg have the higher priority, the guest will spin.

qemu-kvm works fine without any prioritization, since there are no lock hogs.


  I think the only place is live migration and perhaps tcg?

qcow2 and anything else that puts the IO thread to sleep.

... while holding the lock. All those are bugs, we should never ever sleep while holding the lock, it converts an HPET read from something that is cpu bound to something that is io bound.



I think the abstraction we need here is a priority lock, with higher priority given to the iothread. A lock() operation that takes precedence would atomically signal the current owner to drop the lock.

The I/O thread can reliably acquire the lock whenever it needs to.

If you drop all of the qemu_fair_mutex stuff and leave the qemu_mutex getting dropped around select, TCG will generally work reliably. But this is not race free.

What would be the impact of a race here?

Racy is probably the wrong word. To give a concrete example of why one is better than the other, consider live migration.

It would be reasonable to have a check in live migration to iterate unless there was higher priority work. If a VCPU thread needs to acquire the mutex, that could be considered higher priority work. If you don't have an explicit hand off, it's not possible to implement such logic.

Live migration needs not to hold the global lock while copying memory. Failing that, a priority lock would work (in order of increasing priority: tcg -> live migration -> kvm-vcpu -> iothread), but I don't think it's a good direction to pursue. The Linux mantra is, if you have lock contention, don't improve the lock, improve the locking to remove the contention until you no longer understand the code. It's a lot harder but playing with priorities is a dead end IMO.


Just dropping a lock does not result in reliable hand off.

Why do we want a handoff in the first place?

I don't think we do. I think we want the iothread to run in preference to tcg, since tcg is a lock hog under guest control, while the iothread is not a lock hog (excepting live migration).

The io thread is a lock hog practically speaking.

It's not. Give it the highest priority and it will drop the lock and sleep. Give tcg the highest priority and it will hold the lock and spin.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux