RE: [RFC] create a single workqueue for each vm to update vm irq routing table

"Zhanghaoyu (A)" <haoyu.zhang@xxxxxxxxxx> · Sat, 30 Nov 2013 02:46:31 +0000

>On Tue, Nov 26, 2013 at 06:14:27PM +0200, Gleb Natapov wrote:
>> On Tue, Nov 26, 2013 at 06:05:37PM +0200, Michael S. Tsirkin wrote:
>> > On Tue, Nov 26, 2013 at 02:56:10PM +0200, Gleb Natapov wrote:
>> > > On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote:
>> > > > Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
>> > > > > When guest set irq smp_affinity, VMEXIT occurs, then the vcpu 
>> > > > > thread will IOCTL return to QEMU from hypervisor, then vcpu 
>> > > > > thread ask the hypervisor to update the irq routing table, in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM, so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too.
>> > > > > It's unacceptable in some real-time scenario, e.g. telecom. 
>> > > > > 
>> > > > > So, I want to create a single workqueue for each VM, to 
>> > > > > asynchronously performing the RCU synchronization for irq routing table, and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period.
>> > > > > And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared.
>> > > > 
>> > > > I don't think a workqueue is even needed.  You just need to use 
>> > > > call_rcu to free "old" after releasing kvm->irq_lock.
>> > > > 
>> > > > What do you think?
>> > > > 
>> > > It should be rate limited somehow. Since it guest triggarable 
>> > > guest may cause host to allocate a lot of memory this way.
>> > 
>> > The checks in __call_rcu(), should handle this I think.  These keep 
>> > a per-CPU counter, which can be adjusted via rcutree.blimit, which 
>> > defaults to taking evasive action if more than 10K callbacks are 
>> > waiting on a given CPU.
>> > 
>> > 
>> Documentation/RCU/checklist.txt has:
>> 
>>         An especially important property of the synchronize_rcu()
>>         primitive is that it automatically self-limits: if grace periods
>>         are delayed for whatever reason, then the synchronize_rcu()
>>         primitive will correspondingly delay updates.  In contrast,
>>         code using call_rcu() should explicitly limit update rate in
>>         cases where grace periods are delayed, as failing to do so can
>>         result in excessive realtime latencies or even OOM conditions.
>
>I just asked Paul what this means.

My understanding shown as blow,
The synchronous grace period API synchronize_rcu() can prevent current thread from generating a large number of rcu-update subsequently, just as the "self-limits" described above in Documentation/RCU/checklist.txt, can avoid memory exhaustion, but the asynchronous API call_rcu() cannot limit the update rate, need explicitly rate limit.

Thanks,
Zhang Haoyu
>
>> --
>> 			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html