On Tuesday 15 July 2008 07:20:26 Heiko Carstens wrote: > On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote: > > Rusty Russell wrote: > > > On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: > > >> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: > > >>> + /* Wait all others come to life */ > > >>> + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) { > > >>> + if (time_is_before_jiffies(limit)) > > >>> + goto timeout; > > >>> + cpu_relax(); > > >>> + } > > >>> + > > >> > > >> Hmm. I think this could become interesting on virtual machines. The > > >> hypervisor might be to busy to schedule a specific cpu at certain load > > >> scenarios. This would cause a failure even if the cpu is not really > > >> locked up. We had similar problems with the soft lockup daemon on > > >> s390. > > > > > > 5 seconds is a fairly long time. If all else fails we could have a > > > config option to simply disable this code. > > Hmm.. probably a stupid question: but what could happen that a real cpu > (not virtual) becomes unresponsive so that it won't schedule a > MAX_RT_PRIO-1 prioritized task for 5 seconds? Yes. That's exactly what we're trying to detect. Currently the entire machine will wedge. With this patch we can often limp along. Hidetoshi's original problem was a client whose machine had one CPU die, then got wedged as the emergency backup tried to load a module. Along these lines, I found VMWare's relaxed co-scheduling interesting, BTW: http://communities.vmware.com/docs/DOC-4960 > cpu_relax() translates to a hypervisor yield on s390. Probably makes sense > if other architectures would do the same. Yes, I think so too. Actually, doing a random yield-to-other-VCPU on cpu_relax is arguable the right semantic (in Linux it's used for spinning, almost exclusively to wait for other cpus). Cheers, Rusty. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/virtualization