On 3/18/25 11:12, Cavitt, Jonathan wrote:
-----Original Message----- From: Zhang, Jianxun <jianxun.zhang@xxxxxxxxx> Sent: Tuesday, March 18, 2025 10:48 AM To: Cavitt, Jonathan <jonathan.cavitt@xxxxxxxxx>; intel-xe@xxxxxxxxxxxxxxxxxxxxx Cc: Gupta, saurabhg <saurabhg.gupta@xxxxxxxxx>; Zuo, Alex <alex.zuo@xxxxxxxxx>; joonas.lahtinen@xxxxxxxxxxxxxxx; Brost, Matthew <matthew.brost@xxxxxxxxx>; Lin, Shuicheng <shuicheng.lin@xxxxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Wajdeczko, Michal <Michal.Wajdeczko@xxxxxxxxx>; michal.mzorek@xxxxxxxxx Subject: Re: [PATCH v8 6/6] drm/xe/xe_vm: Implement xe_vm_get_property_ioctlOn 3/13/25 11:34, Jonathan Cavitt wrote:Add support for userspace to request a list of observed faults from a specified VM. v2: - Only allow querying of failed pagefaults (Matt Brost) v3: - Remove unnecessary size parameter from helper function, as it is a property of the arguments. (jcavitt) - Remove unnecessary copy_from_user (Jainxun) - Set address_precision to 1 (Jainxun) - Report max size instead of dynamic size for memory allocation purposes. Total memory usage is reported separately. v4: - Return int from xe_vm_get_property_size (Shuicheng) - Fix memory leak (Shuicheng) - Remove unnecessary size variable (jcavitt) v5: - Rename ioctl to xe_vm_get_faults_ioctl (jcavitt) - Update fill_property_pfs to eliminate need for kzalloc (Jianxun) v6: - Repair and move fill_faults break condition (Dan Carpenter) - Free vm after use (jcavitt) - Combine assertions (jcavitt) - Expand size check in xe_vm_get_faults_ioctl (jcavitt) - Remove return mask from fill_faults, as return is already -EFAULT or 0 (jcavitt) v7: - Revert back to using xe_vm_get_property_ioctl - Apply better copy_to_user logic (jcavitt) Signed-off-by: Jonathan Cavitt <jonathan.cavitt@xxxxxxxxx> Suggested-by: Matthew Brost <matthew.brost@xxxxxxxxx> Cc: Jainxun Zhang <jianxun.zhang@xxxxxxxxx> Cc: Shuicheng Lin <shuicheng.lin@xxxxxxxxx> --- drivers/gpu/drm/xe/xe_device.c | 3 + drivers/gpu/drm/xe/xe_vm.c | 134 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_vm.h | 2 + 3 files changed, 139 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index b2f656b2a563..74e510cb0e47 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -194,6 +194,9 @@ static const struct drm_ioctl_desc xe_ioctls[] = { DRM_IOCTL_DEF_DRV(XE_WAIT_USER_FENCE, xe_wait_user_fence_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(XE_VM_GET_PROPERTY, xe_vm_get_property_ioctl, + DRM_RENDER_ALLOW), + };static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 067a9cedcad9..521f0032d9e2 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -42,6 +42,14 @@ #include "xe_wa.h" #include "xe_hmm.h"+static const u16 xe_to_user_engine_class[] = {+ [XE_ENGINE_CLASS_RENDER] = DRM_XE_ENGINE_CLASS_RENDER, + [XE_ENGINE_CLASS_COPY] = DRM_XE_ENGINE_CLASS_COPY, + [XE_ENGINE_CLASS_VIDEO_DECODE] = DRM_XE_ENGINE_CLASS_VIDEO_DECODE, + [XE_ENGINE_CLASS_VIDEO_ENHANCE] = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE, + [XE_ENGINE_CLASS_COMPUTE] = DRM_XE_ENGINE_CLASS_COMPUTE, +}; + static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) { return vm->gpuvm.r_obj; @@ -3551,6 +3559,132 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file) return err; }+static int xe_vm_get_property_size(struct xe_vm *vm, u32 property)+{ + int size = -EINVAL; + + switch (property) { + case DRM_XE_VM_GET_PROPERTY_FAULTS: + spin_lock(&vm->faults.lock); + size = vm->faults.len * sizeof(struct xe_vm_fault); + spin_unlock(&vm->faults.lock); + break; + default: + break; + } + return size; +} + +static int xe_vm_get_property_verify_size(struct xe_vm *vm, u32 property, + int expected, int actual) +{ + switch (property) { + case DRM_XE_VM_GET_PROPERTY_FAULTS: + /* + * Number of faults may increase between calls to + * xe_vm_get_property_ioctl, so just report the + * number of faults the user requests if it's less + * than or equal to the number of faults in the VM + * fault array. + */ + if (actual < expected) + return -EINVAL;Application should be allowed to provide more memory than the needed. In return, the driver should update the arg->size with number of bytes actually written to the memory.In an earlier discussion, we had decided that we should allow for the args->size value to be less than the actual size value, since it would be impossible to prevent additional faults from filling the fault list between calls to the get property ioctl. And in such a case, we would only return in total the faults that could be stored in the memory of said size. In a sense, we had agreed that args->size would express how many faults the user expects to read. So, if the args->size is greater than the total size of the fault array, then user is requesting more faults than there are faults available to be returned. This request cannot be satisfied, so we return -EINVAL.
Yes. You are correct. We catpured this in 4) in the section "Execution Flow" in the design doc. There shouldn't be a valid case for UMD to query more than what from the first query on size, considering the fault list in KMD can only be increased and a query on size is always issued by UMD before fetching.
-Jonathan Cavitt+ break; + default: + if (actual != expected) + return -EINVAL; + break; + } + return 0; +} + +static int fill_faults(struct xe_vm *vm, + struct drm_xe_vm_get_property *args) +{ + struct xe_vm_fault __user *usr_ptr = u64_to_user_ptr(args->data); + struct xe_vm_fault store = { 0 }; + struct xe_vm_fault_entry *entry; + int ret = 0, i = 0, count; + + count = args->size / sizeof(struct xe_vm_fault); + + spin_lock(&vm->faults.lock); + list_for_each_entry(entry, &vm->faults.list, list) { + if (i++ == count) + break; + + memset(&store, 0, sizeof(struct xe_vm_fault)); + + store.address = entry->address; + store.address_type = entry->address_type; + store.address_precision = entry->address_precision; + store.fault_level = entry->fault_level; + store.engine_class = xe_to_user_engine_class[entry->engine_class]; + store.engine_instance = entry->engine_instance; + + ret = copy_to_user(usr_ptr, &store, sizeof(struct xe_vm_fault)); + if (ret) + break; + + usr_ptr++; + } + spin_unlock(&vm->faults.lock); +Update the args->size with the amount of the written. Refer to my another comment above. Also document this in the comment of the interface.+ return ret; +} + +static int xe_vm_get_property_fill_data(struct xe_vm *vm, + struct drm_xe_vm_get_property *args) +{ + switch (args->property) { + case DRM_XE_VM_GET_PROPERTY_FAULTS: + return fill_faults(vm, args); + default: + break; + } + return -EINVAL; +} + +int xe_vm_get_property_ioctl(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct xe_device *xe = to_xe_device(drm); + struct xe_file *xef = to_xe_file(file); + struct drm_xe_vm_get_property *args = data; + struct xe_vm *vm; + int size, ret = 0; + + if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1])) + return -EINVAL; + + vm = xe_vm_lookup(xef, args->vm_id); + if (XE_IOCTL_DBG(xe, !vm)) + return -ENOENT; + + size = xe_vm_get_property_size(vm, args->property); + + if (size < 0) { + ret = size; + goto put_vm; + } else if (!args->size) { + args->size = size; + goto put_vm; + } + + ret = xe_vm_get_property_verify_size(vm, args->property, + args->size, size); + if (XE_IOCTL_DBG(xe, ret)) { + ret = -EINVAL; + goto put_vm; + } + + ret = xe_vm_get_property_fill_data(vm, args); + +put_vm: + xe_vm_put(vm); + return ret; +} + /** * xe_vm_bind_kernel_bo - bind a kernel BO to a VM * @vm: VM to bind the BO to diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index 9bd7e93824da..63ec22458e04 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -196,6 +196,8 @@ int xe_vm_destroy_ioctl(struct drm_device *dev, void *data, struct drm_file *file); int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file); +int xe_vm_get_property_ioctl(struct drm_device *dev, void *data, + struct drm_file *file);void xe_vm_close_and_put(struct xe_vm *vm);