On Tue, Sep 15, 2015 at 02:05:14PM +0200, Christian Borntraeger wrote: > Tejun, > > > commit d59cfc09c32a2ae31f1c3bc2983a0cd79afb3f14 (sched, cgroup: replace > signal_struct->group_rwsem with a global percpu_rwsem) causes some noticably > hickups when starting several kvm guests (which libvirt will move into cgroups > - each vcpu thread and each i/o thread) > When you now start lots of guests in parallel on a bigger system (32CPUs with > 2way smt in my case) the system is so busy that systemd runs into several timeouts > like "Did not receive a reply. Possible causes include: the remote application did > not send a reply, the message bus security policy blocked the reply, the reply > timeout expired, or the network connection was broken." > > The problem seems to be that the newly used percpu_rwsem does a > rcu_synchronize_sched_expedited for all write downs/ups. Can you try: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2015.09.11ab those include Oleg's rework of the percpu rwsem which should hopefully improve things somewhat. But yes, pounding a global lock on a big machine will always suck. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html