On Fri, Jul 29, 2022 at 07:51:29PM +0000, Sean Christopherson wrote: > On Wed, Jul 06, 2022, Chao Peng wrote: > > @@ -1332,9 +1332,18 @@ yet and must be cleared on entry. > > __u64 userspace_addr; /* start of the userspace allocated memory */ > > }; > > > > + struct kvm_userspace_memory_region_ext { > > + struct kvm_userspace_memory_region region; > > + __u64 private_offset; > > + __u32 private_fd; > > + __u32 pad1; > > + __u64 pad2[14]; > > +}; > > + > > /* for kvm_memory_region::flags */ > > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > > #define KVM_MEM_READONLY (1UL << 1) > > + #define KVM_MEM_PRIVATE (1UL << 2) > > Very belatedly following up on prior feedback... > > | I think a flag is still needed, the problem is private_fd can be safely > | accessed only when this flag is set, e.g. without this flag, we can't > | copy_from_user these new fields since they don't exist for previous > | kvm_userspace_memory_region callers. > > I forgot about that aspect of things. We don't technically need a dedicated > PRIVATE flag to handle that, but it does seem to be the least awful soltuion. > We could either add a generic KVM_MEM_EXTENDED_REGION or an entirely new > ioctl(), e.g. KVM_SET_USER_MEMORY_REGION2, but in both approaches there's a decent > chance that we'll end up needed individual "this field is valid" flags anways. > > E.g. if KVM requires pad1 and pad2 to be zero to carve out future extensions, > then we're right back here if some future extension needs to treat '0' as a legal > input. I had such practice (always rejecting none-zero 'pad' value when introducing new user APIs) in other project previously, but I rarely see that in KVM. > > TL;DR: adding KVM_MEM_PRIVATE still seems like the best approach. > > > @@ -4631,14 +4658,35 @@ static long kvm_vm_ioctl(struct file *filp, > > break; > > } > > case KVM_SET_USER_MEMORY_REGION: { > > - struct kvm_userspace_memory_region kvm_userspace_mem; > > + struct kvm_user_mem_region mem; > > + unsigned long size; > > + u32 flags; > > + > > + kvm_sanity_check_user_mem_region_alias(); > > + > > + memset(&mem, 0, sizeof(mem)); > > > > r = -EFAULT; > > - if (copy_from_user(&kvm_userspace_mem, argp, > > - sizeof(kvm_userspace_mem))) > > + > > + if (get_user(flags, > > + (u32 __user *)(argp + offsetof(typeof(mem), flags)))) > > + goto out; > > > Indentation is funky. It's hard to massage this into something short and > readable What about capturing the offset separately? E.g. > > struct kvm_user_mem_region mem; > unsigned int flags_offset = offsetof(typeof(mem), flags)); > unsigned long size; > u32 flags; > > kvm_sanity_check_user_mem_region_alias(); > > memset(&mem, 0, sizeof(mem)); > > r = -EFAULT; > if (get_user(flags, (u32 __user *)(argp + flags_offset))) > goto out; > > But this can actually be punted until KVM_MEM_PRIVATE is fully supported. As of > this patch, KVM doesn't read the extended size, so I believe the diff for this > patch can simply be: Looks good to me, Thanks. Chao > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index da263c370d00..5194beb7b52f 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -4640,6 +4640,10 @@ static long kvm_vm_ioctl(struct file *filp, > sizeof(kvm_userspace_mem))) > goto out; > > + r = -EINVAL; > + if (mem.flags & KVM_MEM_PRIVATE) > + goto out; > + > r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem); > break; > }