Add additional information to the xe_vm so it can report the last 50 relevant exec queues that have been banned on it, as well as the associated pagefault address and address type that caused the ban when applicable. Since we cannot reasonably associate a pagefault to a specific exec queue, whenever a CAT error causes an exec queue to become banned, we blame the last seen pagefault on said exec queue. The last pagefault seen per exec queue is saved to the xe_vm, and the pagefault is updated when a new pagefault is reported or when the last pagefault has been associated with an exec queue, whichever happens first. All new pagefault reports come from xe_gt_pagefault. Also add a tracker that counts the number of times the VM has experienced an engine reset. Finally, add a new ioctl - xe_vm_get_property_ioctl - that allows the user to query this additional information. Signed-off-by: Jonathan Cavitt <joanthan.cavitt@xxxxxxxxx> Suggested-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> Suggested-by: Matthew Brost <matthew.brost@xxxxxxxxx> CC: Zhang Jianxun <jianxun.zhang@xxxxxxxxx> Jonathan Cavitt (6): drm/xe/xe_gt_pagefault: Migrate lookup_vma to xe_vm.h drm/xe/xe_exec_queue: Add ID param to exec queue struct drm/xe/xe_gt_pagefault: Migrate pagefault struct to header drm/xe/xe_vm: Add per VM pagefault info drm/xe/xe_vm: Add per VM reset stats drm/xe/xe_vm: Implement xe_vm_get_property_ioctl drivers/gpu/drm/xe/xe_device.c | 2 + drivers/gpu/drm/xe/xe_exec_queue.c | 7 + drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 + drivers/gpu/drm/xe/xe_gt_pagefault.c | 82 ++++------- drivers/gpu/drm/xe/xe_gt_pagefault.h | 28 ++++ drivers/gpu/drm/xe/xe_guc_submit.c | 4 + drivers/gpu/drm/xe/xe_vm.c | 175 +++++++++++++++++++++++ drivers/gpu/drm/xe/xe_vm.h | 32 +++++ drivers/gpu/drm/xe/xe_vm_types.h | 34 +++++ include/uapi/drm/xe_drm.h | 73 ++++++++++ 10 files changed, 381 insertions(+), 58 deletions(-) -- 2.43.0