Em 01/03/2025 02:53, Raag Jadav escreveu:
On Fri, Feb 28, 2025 at 06:54:12PM -0300, André Almeida wrote:
Hi Raag,
On 2/28/25 11:20, Raag Jadav wrote:
Cc: Lucas
On Fri, Feb 28, 2025 at 09:13:52AM -0300, André Almeida wrote:
When a device get wedged, it might be caused by a guilty application.
For userspace, knowing which app was the cause can be useful for some
situations, like for implementing a policy, logs or for giving a chance
for the compositor to let the user know what app caused the problem.
This is an optional argument, when `PID=-1` there's no information about
the app caused the problem, or if any app was involved during the hang.
Sometimes just the PID isn't enough giving that the app might be already
dead by the time userspace will try to check what was this PID's name,
so to make the life easier also notify what's the app's name in the user
event.
Signed-off-by: André Almeida <andrealmeid@xxxxxxxxxx>
[...]
len = scnprintf(event_string, sizeof(event_string), "%s", "WEDGED=");
@@ -562,6 +564,14 @@ int drm_dev_wedged_event(struct drm_device *dev, unsigned long method)
drm_info(dev, "device wedged, %s\n", method == DRM_WEDGE_RECOVERY_NONE ?
"but recovered through reset" : "needs recovery");
+ if (info) {
+ snprintf(pid_string, sizeof(pid_string), "PID=%u", info->pid);
+ snprintf(comm_string, sizeof(comm_string), "APP=%s", info->comm);
+ } else {
+ snprintf(pid_string, sizeof(pid_string), "%s", "PID=-1");
+ snprintf(comm_string, sizeof(comm_string), "%s", "APP=none");
+ }
This is not much use for wedge cases that needs recovery, since at that point
the userspace will need to clean house anyway.
Which leaves us with only 'none' case and perhaps the need for standardization
of "optional telemetry collection".
Thoughts?
I had the feeling that 'none' was already meant to be used for that. Do you
think we should move to another naming? Given that we didn't reach the merge
window yet we could potentially change that name without much damage.
No, I meant thoughts on possible telemetry data that the drivers might
think is useful for userspace (along with PID) and can be presented in
a vendor agnostic manner (just like wedged event).
I'm not if I agree that this will only be used for telemetry and for the
`none` use case. As stated by Xaver, there's use case to know which app
caused the device to get wedged (like switching to software rendering)
and to display something for the user after the recovery is done (e.g.
"The game <app name> stopped working and Plasma has reset").