Re: [PATCH v2 2/2] s390/kvm: handle diagnose 318 instruction call

David Hildenbrand <david@xxxxxxxxxx> · Wed, 12 Dec 2018 11:49:56 +0100

@Christian see suggestion below

>>
>> So, no field for cpvc? Can you give me a short summary how these values
>> are to be used. E.g. who will use the cpvc? Who can actually read both
>> values. A guest VCPU? Firmware? (well firmware obviously has no access
>> to cpvc)
>>
> 
> Only the CPNC is architected to be stored in the SIE block... CPVC has to be held elsewhere.
> 
> The CPNC can be seen during a ring dump, which is helpful when a problem causes a CPU to
> enter a check-stop state and we can't get a CEC dump.
> 
> We can really only see the CPVC from the host's perspective, perhaps during a guest dump 
> or via VM_EVENT messages in the guest debug logs.

Okay, so I assume we'll e.g. want to include that information in guest
dumps e.g. in QEMU when dumping in ELF format? And/or allow other ways
to access this information.

Thanks for the information!

> 
>>> +	__u8	reserved6b;		/* 0x006b */
>>>  	__u32	todpr;			/* 0x006c */
>>>  #define GISA_FORMAT1 0x00000001
>>>  	__u32	gd;			/* 0x0070 */
>>> @@ -391,6 +392,7 @@ struct kvm_vcpu_stat {
>>>  	u64 diagnose_9c;
>>>  	u64 diagnose_258;
>>>  	u64 diagnose_308;
>>> +	u64 diagnose_318;
>>>  	u64 diagnose_500;
>>>  	u64 diagnose_other;
>>>  };
>>> @@ -804,6 +806,14 @@ struct kvm_s390_vsie {
>>>  	struct page *pages[KVM_MAX_VCPUS];
>>>  };
>>>  
>>> +union kvm_s390_diag318_info {
>>> +	u64 cpc;
>>> +	struct {
>>> +		u64 cpnc : 8;
>>> +		u64 cpvc : 56;
>>> +	};
>>> +};
>>> +
>>>  struct kvm_arch{
>>>  	void *sca;
>>>  	int use_esca;
>>> @@ -838,6 +848,7 @@ struct kvm_arch{
>>>  	/* subset of available cpu features enabled by user space */
>>>  	DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
>>>  	struct kvm_s390_gisa *gisa;
>>> +	union kvm_s390_diag318_info diag318_info;
>>>  };
>>>  
>>>  #define KVM_HVA_ERR_BAD		(-1UL)
>>> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
>>> index 16511d9..6420aad 100644
>>> --- a/arch/s390/include/uapi/asm/kvm.h
>>> +++ b/arch/s390/include/uapi/asm/kvm.h
>>> @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
>>>  #define KVM_S390_VM_CRYPTO		2
>>>  #define KVM_S390_VM_CPU_MODEL		3
>>>  #define KVM_S390_VM_MIGRATION		4
>>> +#define KVM_S390_VM_MACHINE		5
>>>  
>>>  /* kvm attributes for mem_ctrl */
>>>  #define KVM_S390_VM_MEM_ENABLE_CMMA	0
>>> @@ -130,6 +131,7 @@ struct kvm_s390_vm_cpu_machine {
>>>  #define KVM_S390_VM_CPU_FEAT_PFMFI	11
>>>  #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
>>>  #define KVM_S390_VM_CPU_FEAT_KSS	13
>>> +#define KVM_S390_VM_CPU_FEAT_DIAG318	14
>>>  struct kvm_s390_vm_cpu_feat {
>>>  	__u64 feat[16];
>>>  };
>>> @@ -168,6 +170,9 @@ struct kvm_s390_vm_cpu_subfunc {
>>>  #define KVM_S390_VM_MIGRATION_START	1
>>>  #define KVM_S390_VM_MIGRATION_STATUS	2
>>>  
>>> +/* kvm attributes for KVM_S390_VM_MACHINE */
>>> +#define KVM_S390_VM_MACHINE_CPC	0
>>> +
>>>  /* for KVM_GET_REGS and KVM_SET_REGS */
>>>  struct kvm_regs {
>>>  	/* general purpose regs for s390 */
>>> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
>>> index 45634b3d..b61ffd2 100644
>>> --- a/arch/s390/kvm/diag.c
>>> +++ b/arch/s390/kvm/diag.c
>>> @@ -235,6 +235,16 @@ static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
>>>  	return ret < 0 ? ret : 0;
>>>  }
>>>  
>>> +static int __diag_set_control_prog_name(struct kvm_vcpu *vcpu)
>>> +{
>>> +	unsigned int reg = (vcpu->arch.sie_block->ipa & 0xf0) >> 4;
>>> +	unsigned long cpc = vcpu->run->s.regs.gprs[reg];
>>
>> 1. This should be a u64.
>>
>> 2. What if a !KVM_S390_VM_CPU_FEAT_DIAG318 ? Shouldn't there be an error
>> reported to the guest? (diag subcode not installed)
>>
> 
> Since we're emulating the diag318 feature entirely for a guest, and we're able
> to fence its availability via CPU models in QEMU (which is not obvious from the
> KVM code, admittedly), we can safely allow this feature if KVM can support it.

Yes, while we in KVM "can" allow it, QEMU can do whatever it wants. e.g.
disable KVM_S390_VM_CPU_FEAT_DIAG318. Which can easily happen (old QEMU
on new HW).

In case QEMU disables KVM_S390_VM_CPU_FEAT_DIAG318, we should keep old
behavior -> Send it to user space and let QEMU handle it! (-EOPNOTSUPP)

A guest will in this case not see sclp.has_diag318, ideally not execute
diag318, but if it is still done, diag318 should also properly fail as
it used to. But see below.

> 
> With both KVM and QEMU support, the guest kernel will be able to pass the 
> if (sclp.has_diag318) check in patch #1 and execute the diag instruction (with 
> KVM handling) safely.
> 
> The whole implementation of these values is a bit tricky, I admit. I find myself
> going back-and-forth to the docs to make sure I have this info correct.
> 
> Please let me know if there's a better way to convey this information to someone
> who might have to look back on this code. Perhaps some documentation on the new 
> VM attribute (suggested by Conny) would suffice.

I guess the comment above
"allow_cpu_feat(KVM_S390_VM_CPU_FEAT_DIAG318);" is sufficient. But here,
we should really respect what QEMU asked for. If qemu asks for
!KVM_S390_VM_CPU_FEAT_DIAG318, diag318 should go to user space just as
it used to before.

> 
>>> +
>>> +	vcpu->stat.diagnose_318++;
>>> +	kvm_s390_set_cpc(vcpu->kvm, cpc);
>>> +	return 0;
>>> +}
>>> +
>>>  int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>>>  {
>>>  	int code = kvm_s390_get_base_disp_rs(vcpu, NULL) & 0xffff;
>>> @@ -254,6 +264,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>>>  		return __diag_page_ref_service(vcpu);
>>>  	case 0x308:
>>>  		return __diag_ipl_functions(vcpu);
>>> +	case 0x318:
>>> +		return __diag_set_control_prog_name(vcpu);
>>>  	case 0x500:
>>>  		return __diag_virtio_hypercall(vcpu);
>>>  	default:
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index fe24150..67aa34a 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -157,6 +157,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>>>  	{ "instruction_diag_9c", VCPU_STAT(diagnose_9c) },
>>>  	{ "instruction_diag_258", VCPU_STAT(diagnose_258) },
>>>  	{ "instruction_diag_308", VCPU_STAT(diagnose_308) },
>>> +	{ "instruction_diag_318", VCPU_STAT(diagnose_318) },
>>>  	{ "instruction_diag_500", VCPU_STAT(diagnose_500) },
>>>  	{ "instruction_diag_other", VCPU_STAT(diagnose_other) },
>>>  	{ NULL }
>>> @@ -371,6 +372,10 @@ static void kvm_s390_cpu_feat_init(void)
>>>  
>>>  	if (MACHINE_HAS_ESOP)
>>>  		allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
>>> +
>>> +	/* Enable DIAG318 guest support unconditionally */
>>> +	allow_cpu_feat(KVM_S390_VM_CPU_FEAT_DIAG318);
>>
>> Interesting. Is that the right thing to do?
>>
>> While somebody can always write (__diag_set_control_prog_name emualtion
>> handler), I don't see a read handler.
>>
>> How is that supposed to work e.g. on older hardware?
>>
>> (I was expecting HW to handle reading, that's why we add it to the SCB-
>> Can you elaborate what the value stored in the SCB is actually used for?
>> Is a guest CPU not actually able to read that value somehow?)
>>
> 
> A CPU (or VCPU) should have a copy of the CPNC stored in a local register. 
> I'm sorry I can't be super helpful in my explanation, as I am not 100%
> familiar with the ring dump process, but perhaps the CPU retrieves the
> value from the SCB during a ring dump? I'll have to phone-a-friend on this
> one.

That makes sense. So a guest cannot read the value, however the actual
CPU can read the value (e.g. when in check-stop) via the SCB. So one can
identify the workload triggering the crash (maybe).

So allowing this feature unconditionally will not actually allow HW to
use it, however we could get that information in QEMU, to e.g. include
it in a guest dump.

I am wondering if another way to handle this would maybe be even better.
It would allow also new QEMU on old KVM to get access to these values.
It will simply not be written to HW then (bad luck, there is no HW
support eventually)

It goes like this:

1. don't implement diag318 in the kernel, implement it in user space.
diag318 is already forwarded to QEMU as of now. (I don't consider
diag318 peformance relevant)

2. in user space, we can always support diag318, even without KVM/HW
support. (could glue to machines if really needed)

3. If we get a diag318 and the CPU feature is not enabled, handle it if
diag318 is not available (exception, just as we do now).

3. If we get a diag318 and the CPU feature is enabled, store both values
in QEMU (for migration and e.g. DUMPs) AND ...

4. ... if KVM supports KVM_S390_VM_MACHINE_CPC, also write the new
values (or even only the CPNC! ) to KVM.

5. During migration, if KVM supports KVM_S390_VM_MACHINE_CPC, also write
the migrated value to KVM.

KVM code is minimized. KVM_S390_VM_CPU_FEAT_DIAG318 in KVM is not
needed. We might even only need writing of KVM_S390_VM_MACHINE_CPC.

@Christian what do you think?

> 
> Please see my explanation to your questions above and let me know if that
> covers this question as well.
> 
> I think a more informative comment in the code would be helpful here.
> 
>>> +
>>>  	/*
>>>  	 * We need SIE support, ESOP (PROT_READ protection for gmap_shadow),
>>>  	 * 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing).
>>> @@ -1173,6 +1178,75 @@ static int kvm_s390_get_tod(struct kvm *kvm, struct kvm_device_attr *attr)
>>>  	return ret;
>>>  }
>>>  

[...]
>>> +++ b/arch/s390/kvm/vsie.c
>>> @@ -397,6 +397,9 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>>>  	memcpy(scb_o->gcr, scb_s->gcr, 128);
>>>  	scb_o->pp = scb_s->pp;
>>>  
>>> +	/* machine information */
>>> +	scb_o->cpnc = scb_s->cpnc;
>>
>> Is unshadow needed? I think we have diag318 for that.
>>
> 
> Don't we need to make sure the CPNC in the SCB for a level 3+ guest is set correctly? I don't
> know if we support anything other than "linux on linux on linux" so-to-speak, but if we do
> then we need to make sure that when diag318 is called we correctly convey that value to the
> correct SCB.
> 
> Correct me if I have my understanding of SIE / vSIE handling wrong here.

If HW(SIE) is not able to modify the CPNC, then we don't need an
unshadow path.

Assume our nested guests (g3) wants to set the CPNC.

1. g3 triggers diag318.
2. g3 intercepts into g1, leaving the vSIE.
3. g1 runs g2 to handle the SIE exit.
4. g2 processes the diag318, changing the value in the g3 SCB.
5. g2 wants to continue running g3 by executing the SIE.
6. g1 creates a shadow g3 SCB, using the new value from the g3 SCB.
7. The new CBNC is active when g3 is executed.

So I don't think the unshadow code is necessary, only the shadow code.

-- 

Thanks,

David / dhildenb