On Tue, 17 May 2022 15:39:14 +0200 Janosch Frank <frankja@xxxxxxxxxxxxx> wrote: > On 5/17/22 12:59, Claudio Imbrenda wrote: > > On Mon, 16 May 2022 09:08:13 +0000 > > Janosch Frank <frankja@xxxxxxxxxxxxx> wrote: > > > >> Sometimes dumping inside of a VM fails, is unavailable or doesn't > >> yield the required data. For these occasions we dump the VM from the > >> outside, writing memory and cpu data to a file. > >> > >> Up to now PV guests only supported dumping from the inside of the > >> guest through dumpers like KDUMP. A PV guest can be dumped from the > >> hypervisor but the data will be stale and / or encrypted. > >> > >> To get the actual state of the PV VM we need the help of the > >> Ultravisor who safeguards the VM state. New UV calls have been added > >> to initialize the dump, dump storage state data, dump cpu data and > >> complete the dump process. We expose these calls in this patch via a > >> new UV ioctl command. > >> > >> The sensitive parts of the dump data are encrypted, the dump key is > >> derived from the Customer Communication Key (CCK). This ensures that > >> only the owner of the VM who has the CCK can decrypt the dump data. > >> > >> The memory is dumped / read via a normal export call and a re-import > >> after the dump initialization is not needed (no re-encryption with a > >> dump key). > >> > >> Signed-off-by: Janosch Frank <frankja@xxxxxxxxxxxxx> > > The cut code will be fixed according to your requests. > > [...] > >> +/* /** > >> + * kvm_s390_pv_dump_stor_state > >> + * > >> + * @kvm: pointer to the guest's KVM struct > >> + * @buff_user: Userspace pointer where we will write the results to > >> + * @gaddr: Starting absolute guest address for which the storage state > >> + * is requested. This value will be updated with the last > >> + * address for which data was written when returning to > >> + * userspace. > >> + * @buff_user_len: Length of the buff_user buffer > >> + * @rc: Pointer to where the uvcb return code is stored > >> + * @rrc: Pointer to where the uvcb return reason code is stored > >> + * > > > > please add: > > Context: kvm->lock needs to be held > > Sure > > > > > also explain that part of the user buffer might be written to even in > > case of failure (this also needs to go in the documentation) > > Ok > > > > >> + * Return: > >> + * 0 on success > > > > rc and rrc will also be set in case of success > > But that's different from the return code of this function and would > belong to the function description above, no? yes > > > > >> + * -ENOMEM if allocating the cache fails > >> + * -EINVAL if gaddr is not aligned to 1MB > >> + * -EINVAL if buff_user_len is not aligned to uv_info.conf_dump_storage_state_len > >> + * -EINVAL if the UV call fails, rc and rrc will be set in this case > > > > have you considered a different code for UVC failure? > > so the caller can know that rc and rrc are meaningful > > > > or just explain that rc and rrc will always be set; if the UVC is not > > performed, rc and rrc will be 0 > > If the UVC is not performed the rcs will not be *changed*, so it's > advisable to set them to 0 to recognize a change. ah, right, you jump to out only for errors after the allocation > > > Also: > While I generally like commenting as much as possible, this is starting > to get out of hand, the comment header is now taking up a lot of space. > I'll put the rc/rrc comment into the api documentation and I'm > considering putting the partial write comment into there too. I don't know, the comments are also used to generate documentation automatically, so I think they should contain all the information in one way or the other. I don't see issues with huge comments, if the function is complex and has lots of parameters and possible return values. as much as it bothers you, the whole comment block has 0 lines of actual description of what the function does, so I don't think this is getting out of hand :) > > > > >> + * -EFAULT if copying the result to buff_user failed > >> + */ > >> +int kvm_s390_pv_dump_stor_state(struct kvm *kvm, void __user *buff_user, > >> + u64 *gaddr, u64 buff_user_len, u16 *rc, u16 *rrc) > >> +{ > >> + struct uv_cb_dump_stor_state uvcb = { > >> + .header.cmd = UVC_CMD_DUMP_CONF_STOR_STATE, > >> + .header.len = sizeof(uvcb), > >> + .config_handle = kvm->arch.pv.handle, > >> + .gaddr = *gaddr, > >> + .dump_area_origin = 0, > >> + }; > >> + size_t buff_kvm_size; > >> + size_t size_done = 0; > >> + u8 *buff_kvm = NULL; > >> + int cc, ret; > >> + > >> + ret = -EINVAL; > >> + /* UV call processes 1MB guest storage chunks at a time */ > >> + if (!IS_ALIGNED(*gaddr, HPAGE_SIZE)) > >> + goto out; > >> + > >> + /* > >> + * We provide the storage state for 1MB chunks of guest > >> + * storage. The buffer will need to be aligned to > >> + * conf_dump_storage_state_len so we don't end on a partial > >> + * chunk. > >> + */ > >> + if (!buff_user_len || > >> + !IS_ALIGNED(buff_user_len, uv_info.conf_dump_storage_state_len)) > >> + goto out; > >> + > >> + /* > >> + * Allocate a buffer from which we will later copy to the user > >> + * process. We don't want userspace to dictate our buffer size > >> + * so we limit it to DUMP_BUFF_LEN. > >> + */ > >> + ret = -ENOMEM; > >> + buff_kvm_size = min_t(u64, buff_user_len, DUMP_BUFF_LEN); > >> + buff_kvm = vzalloc(buff_kvm_size); > >> + if (!buff_kvm) > >> + goto out; > >> + > >> + ret = 0; > >> + uvcb.dump_area_origin = (u64)buff_kvm; > >> + /* We will loop until the user buffer is filled or an error occurs */ > >> + do { > >> + /* Get 1MB worth of guest storage state data */ > >> + cc = uv_call_sched(0, (u64)&uvcb); > >> + > >> + /* All or nothing */ > >> + if (cc) { > >> + ret = -EINVAL; > >> + break; > >> + } > >> + > >> + size_done += uv_info.conf_dump_storage_state_len; > > > > maybe save this in a local const variable with a shorter name? would be > > more readable? const u64 dump_len = uv_info.conf_dump_storage_state_len; > > did you see this one ^ ? > >> + uvcb.dump_area_origin += uv_info.conf_dump_storage_state_len; > >> + buff_user_len -= uv_info.conf_dump_storage_state_len; > >> + uvcb.gaddr += HPAGE_SIZE; > >> + > >> + /* KVM Buffer full, time to copy to the process */ > >> + if (!buff_user_len || size_done == DUMP_BUFF_LEN) { > >> + if (copy_to_user(buff_user, buff_kvm, size_done)) { > >> + ret = -EFAULT; > >> + break; > >> + } > >> + > >> + buff_user += size_done; > >> + size_done = 0; > >> + uvcb.dump_area_origin = (u64)buff_kvm; > >> + } > >> + } while (buff_user_len); > >> + > >> + /* Report back where we ended dumping */ > >> + *gaddr = uvcb.gaddr; > >> + > >> + /* Lets only log errors, we don't want to spam */ > >> +out: > >> + if (ret) > >> + KVM_UV_EVENT(kvm, 3, > >> + "PROTVIRT DUMP STORAGE STATE: addr %llx ret %d, uvcb rc %x rrc %x", > >> + uvcb.gaddr, ret, uvcb.header.rc, uvcb.header.rrc); > >> + *rc = uvcb.header.rc; > >> + *rrc = uvcb.header.rrc; > >> + vfree(buff_kvm); > >> + > >> + return ret; > >> +} > >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > >> index bb2f91bc2305..1c60c2d314ba 100644 > >> --- a/include/uapi/linux/kvm.h > >> +++ b/include/uapi/linux/kvm.h > >> @@ -1653,6 +1653,20 @@ struct kvm_s390_pv_unp { > >> __u64 tweak; > >> }; > >> > >> +enum pv_cmd_dmp_id { > >> + KVM_PV_DUMP_INIT, > >> + KVM_PV_DUMP_CONFIG_STOR_STATE, > >> + KVM_PV_DUMP_COMPLETE, > >> +}; > >> + > >> +struct kvm_s390_pv_dmp { > >> + __u64 subcmd; > >> + __u64 buff_addr; > >> + __u64 buff_len; > >> + __u64 gaddr; /* For dump storage state */ > >> + __u64 reserved[4]; > >> +}; > >> + > >> enum pv_cmd_info_id { > >> KVM_PV_INFO_VM, > >> KVM_PV_INFO_DUMP, > >> @@ -1696,6 +1710,7 @@ enum pv_cmd_id { > >> KVM_PV_PREP_RESET, > >> KVM_PV_UNSHARE_ALL, > >> KVM_PV_INFO, > >> + KVM_PV_DUMP, > >> }; > >> > >> struct kvm_pv_cmd { > > >