Re: [PATCH v4.5 09/36] KVM: s390: protvirt: Add initial vm and cpu lifecycle handling

David Hildenbrand <david@xxxxxxxxxx> · Wed, 26 Feb 2020 10:15:01 +0100

On 26.02.20 10:12, Christian Borntraeger wrote:
> 
> 
> On 26.02.20 09:28, David Hildenbrand wrote:
>> On 26.02.20 09:12, Christian Borntraeger wrote:
>>>
>>>
>>> On 25.02.20 23:37, David Hildenbrand wrote:
>>>>
>>>>> +static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>>>>> +{
>>>>> +	unsigned long base = uv_info.guest_base_stor_len;
>>>>> +	unsigned long virt = uv_info.guest_virt_var_stor_len;
>>>>> +	unsigned long npages = 0, vlen = 0;
>>>>> +	struct kvm_memory_slot *memslot;
>>>>> +
>>>>> +	kvm->arch.pv.stor_var = NULL;
>>>>> +	kvm->arch.pv.stor_base = __get_free_pages(GFP_KERNEL, get_order(base));
>>>>> +	if (!kvm->arch.pv.stor_base)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	/*
>>>>> +	 * Calculate current guest storage for allocation of the
>>>>> +	 * variable storage, which is based on the length in MB.
>>>>> +	 *
>>>>> +	 * Slots are sorted by GFN
>>>>> +	 */
>>>>> +	mutex_lock(&kvm->slots_lock);
>>>>> +	memslot = kvm_memslots(kvm)->memslots;
>>>>> +	npages = memslot->base_gfn + memslot->npages;
>>>>> +	mutex_unlock(&kvm->slots_lock);
>>>>
>>>> As discussed, I think we should just use mem_limit and check against
>>>> some hardcoded upper limit. But yeah, we can do that as an addon (in
>>>> which case memory hotplug will require special tweaks to detect this
>>>> from user space ... e.g., a new capability)
>>>>
>>>>
>>>> [...]
>>>>
>>>>> +int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
>>>>> +{
>>>>> +		struct uv_cb_cgc uvcb = {
>>>>> +		.header.cmd = UVC_CMD_CREATE_SEC_CONF,
>>>>> +		.header.len = sizeof(uvcb)
>>>>> +	};
>>>>> +	int cc, ret;
>>>>> +	u16 dummy;
>>>>> +
>>>>> +	ret = kvm_s390_pv_alloc_vm(kvm);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>> +	/* Inputs */
>>>>> +	uvcb.guest_stor_origin = 0; /* MSO is 0 for KVM */
>>>>> +	uvcb.guest_stor_len = kvm->arch.pv.guest_len;
>>>>> +	uvcb.guest_asce = kvm->arch.gmap->asce;
>>>>> +	uvcb.guest_sca = (unsigned long)kvm->arch.sca;
>>>>> +	uvcb.conf_base_stor_origin = (u64)kvm->arch.pv.stor_base;
>>>>> +	uvcb.conf_virt_stor_origin = (u64)kvm->arch.pv.stor_var;
>>>>> +
>>>>> +	cc = uv_call(0, (u64)&uvcb);
>>>>> +	*rc = uvcb.header.rc;
>>>>> +	*rrc = uvcb.header.rrc;
>>>>> +	KVM_UV_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",
>>>>> +		     uvcb.guest_handle, uvcb.guest_stor_len, *rc, *rrc);
>>>>> +
>>>>> +	/* Outputs */
>>>>> +	kvm->arch.pv.handle = uvcb.guest_handle;
>>>>> +
>>>>> +	if (cc) {
>>>>> +		if (uvcb.header.rc & UVC_RC_NEED_DESTROY)
>>>>> +			kvm_s390_pv_deinit_vm(kvm, &dummy, &dummy);
>>>>> +		else
>>>>> +			kvm_s390_pv_dealloc_vm(kvm);
>>>>> +		return -EIO;
>>>>
>>>> A lot easier to read :)
>>>>
>>>>
>>>> Fell free add my rb with or without the mem_limit change.
>>>
>>> I think I will keep the memslot logic. For hotplug we actually need
>>> to notify the ultravisor that the guest size has changed as only the
>>> ultravisor will be able to inject an addressing exception.
>>
>> So holes in between memory slots won't be properly considered? What will
>> happen if a guest accesses such memory right now?
> 
> QEMU would get a fault (just like when QEMU would read from a non-existing mapping).
> I think this is ok, as for non-secure this would also trigger a crash (in the guest
> though) since we do not provide the proper memory increment handling in QEMU after
> we  have dropped the standby memory support. 

Yeah, should be okay for all current use cases.

>> I double checked (virt/kvm/kvm_main.c:update_memslots()), and the slots
>> are definitely sorted "based on their GFN". I think, it's lowest GFN
>> first, so the code in here would be wrong with more than one slot.
>>
>> Can you double check, because I might misinterpret the code.
> 
> kvm_s390_get_cmma also uses the first memslot to calculate the buffer size.
> I verified that with a hacked QEMU and printk that this is indeed sorted
> started with the last memslot. 

Then I'm really looking forward to the memslot rework currently on the
list, because this is not-so-nice-code IMHO. Thanks for verifying!

-- 
Thanks,

David / dhildenb