Re: [PATCH v3 10/12] KVM: s390: add and wire function gib_alert_irq_handler()

Pierre Morel <pmorel@xxxxxxxxxxxxx> · Fri, 30 Nov 2018 09:58:05 +0100

On 29/11/2018 14:10, Michael Mueller wrote:


On 29.11.18 11:10, Pierre Morel wrote:
On 28/11/2018 18:51, Michael Mueller wrote:


On 28.11.18 17:24, Pierre Morel wrote:
On 28/11/2018 11:19, Michael Mueller wrote:
The patch implements a handler for GIB alert interruptions
on the host. Its task is to alert storage backed guests that
interrupts are pending for them.

A GIB alert interrupt statistic counter is added as well:

$ cat /proc/interrupts
           CPU0       CPU1
   ...
   GAL:       0          0   [I/O] GIB Alert
   ...

Signed-off-by: Michael Mueller <mimu@xxxxxxxxxxxxx>
---

...

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e00eae7ec0b8..283c51362ca8 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3531,6 +3531,10 @@ static int vcpu_post_run(struct kvm_vcpu 
*vcpu, int exit_reason)
      vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;
      atomic_dec(&vcpu->kvm->arch.vcpus_in_sie);
+    if (vcpu->kvm->arch.gib_in_use &&
+        !in_alert_list(vcpu->kvm->arch.gisa) &&
+        !atomic_read(&vcpu->kvm->arch.vcpus_in_sie))
+        vcpu->kvm->arch.gisa->iam = vcpu->kvm->arch.iam;


Here, AFAIU, if the IAM is set, we must take care that not bit from IPM
for which IAM is set is also set otherwise we won't get an 
interruption.

My view is that we have a short time window between SIE exit and
the restoration of the IAM in the GISA, when an interruption might
be indicated by setting the respective ISC bit in the IPM. We will
not get an GAL interruption in the host for that. I think until
here we are on the same page, right?

In that situation the host needs to take action.

And it is taken, see the trace. These interruptions are delivered
outside of SIE because they are observed by:

kvm_s390_deliver_pending_interrupts()
   deliverable_irqs(vcpu)
      IPM != 0
        __deliver_io()

If I don't make mistake, this code is only executed during 
vcpu_pre_run().

What I understand is that the next time the vcpu is scheduled, for 
example due to another interrupt, the IPM interrupts are delivered 
through this channel.

In the case the vcpu does not get scheduled they are not delivered.

Idle vcpus will always get scheduled by the hrtimer function
kvm_s390_idle_wakeup() which calls kvm_s390_vcpu_wakeup() which then
go through kvm_s390_deliver_pending_interrupts() and deliver the
interruption.

That's what we see in the trace below. (Well that's an assumption but I 
know that only 5 interruptions were generated by the device)

The question is if the interruption can't be delivered more early
instead of waiting for one of the ckc_timers to expire.

Yes, what if the timer takes a very long time.
Think of NO_HZ_IDLE configuration






00 01543426770:725504 4 - 01 00000000e280ea16  inject: I/O (AI/gisa) 
isc 3
00 01543426770:727023 4 - 02 00000000676fd4ca 
00[0706c00180000000-00000000008e9428]: deliver: I/O (AI/gisa) isc 6
00 01543426770:727197 4 - 02 00000000676fd4ca 
00[0706c00180000000-00000000008e9428]: deliver: I/O (AI/gisa) isc 6
00 01543426770:727353 4 - 02 00000000676fd4ca 
00[0706c00180000000-00000000008e9428]: deliver: I/O (AI/gisa) isc 6
00 01543426770:727506 4 - 02 00000000676fd4ca 
00[0706c00180000000-00000000008e9428]: deliver: I/O (AI/gisa) isc 6
00 01543426770:731878 4 - 02 00000000676fd4ca 
00[0706c00180000000-00000000008e9428]: deliver: I/O (AI/gisa) isc 6
00 01543426770:731987 4 - 01 00000000e280ea16  inject: I/O (AI/gisa) 
isc 3






AFAIU you do not need to deliver IRQ which IPM is set in GISA during 
vcpu_pre_run(), they will be delivered to the guest on SIE entry.

I know that is the patch that I drew back, but for other reasons.

Yes but this is not what is done here.
Here you deliver the interrupt on the next schedule but you do not 
provoque the schedule, just wait for the next interrupt. If it comes.

contined here under....



I tested this Code example in:
Message-Id: <1541009577-29656-8-git-send-email-pmorel@xxxxxxxxxxxxx>


Back to vcpu_post_run(), I am not sure that this is the better place 
to reset the IAM.

What better place you have in mind?

...continued from above

yes, when the vCPU issues a WAIT (sorry not HALT), in post_run(), 
kvm_handle_sie_intercept()


Shouldn't we set the IAM only if the last active vcpu issues a halt?

Can you be little more specific with "vcpu issues a halt"? My view
is the none of the vcpus of a guest are in SIE context. Currently
I identify this situation with the vcpus_in_sie metric.

in __set_cpu_idle()
seems to me to excatly follow the documentation.

There would be two metrics:
- 1: are vCPU running
- 2: is one vCPU in wait state

You need to setup the IAM only in the case
(NO vCPU running && at least one CPU wait)




Is there other SIE exit reasons to activate the alert list?

I don't think there is. Have you read something?

No, I think there is no other reason to activate the alert list than 
when the condition above becomes true.
In all other SIE exit reason the vCPU will be rescheduled after the 
event has been processed.
STOP is special, but we surely do not want to send an IRQ to a stopped vCPU.



Regards,
Pierre





--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany