On Thu, Oct 10, 2024 at 08:02:10AM -0500, Lucas De Marchi wrote: > On Tue, Oct 08, 2024 at 06:02:43PM +0300, Raag Jadav wrote: > > On Thu, Oct 03, 2024 at 03:23:22PM +0300, Raag Jadav wrote: > > > On Tue, Oct 01, 2024 at 02:20:29PM +0200, Michal Wajdeczko wrote: > > > > On 30.09.2024 09:38, Raag Jadav wrote: > > > > > > > > > > +/** > > > > > + * enum drm_wedge_recovery - Recovery method for wedged device in order of > > > > > + * severity. To be set as bit fields in drm_device.wedge_recovery variable. > > > > > + * Drivers can choose to support any one or multiple of them depending on > > > > > + * their needs. > > > > > + */ > > > > > +enum drm_wedge_recovery { > > > > > + /** @DRM_WEDGE_RECOVERY_REBIND: unbind + rebind driver */ > > > > > + DRM_WEDGE_RECOVERY_REBIND, > > > > > + > > > > > + /** @DRM_WEDGE_RECOVERY_BUS_RESET: unbind + reset bus device + rebind */ > > > > > + DRM_WEDGE_RECOVERY_BUS_RESET, > > > > > + > > > > > + /** @DRM_WEDGE_RECOVERY_REBOOT: reboot system */ > > > > > + DRM_WEDGE_RECOVERY_REBOOT, > > > > > + > > > > > + /** @DRM_WEDGE_RECOVERY_MAX: for bounds checking, do not use */ > > > > > + DRM_WEDGE_RECOVERY_MAX > > > > > +}; > > > > > + > > > > > /** > > > > > * struct drm_device - DRM device structure > > > > > * > > > > > @@ -317,6 +337,9 @@ struct drm_device { > > > > > * Root directory for debugfs files. > > > > > */ > > > > > struct dentry *debugfs_root; > > > > > + > > > > > + /** @wedge_recovery: Supported recovery methods for wedged device */ > > > > > + unsigned long wedge_recovery; > > > > > > > > hmm, so before the driver can ask for a reboot as a recovery method from > > > > wedge it has to somehow add 'reboot' as available method? why it that? > > > > > > It's for consumers to use as fallbacks in case the preferred recovery method > > > (sent along with uevent) don't workout. (patch 2/5) > > > > On second thought... > > > > Lucas, do we have a convincing enough usecase for fallback recovery? > > If <method> were to fail, I would expect there to be even bigger problems > > like kernel crash or unrecoverable hardware failure. > > > > At that point is it worth retrying? > > when we were talking about this, I brought it up about allowing the > driver to inform what was the supported wedge recovery mechanisms > when the notification is sent. Not to be intended as fallback mechanism. > > So if the driver sends a notification with: > > DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET | DRM_WEDGE_RECOVERY_REBOOT > > it means any of these would be suitable, with the first being the option > with less side-effect. I don't think we are advising userspace to use > fallback, just informing what the driver/device supports. Depending on > the error, the driver may leave only > > DRM_WEDGE_RECOVERY_REBOOT > > That name could actually be DRM_WEDGE_RECOVERY_NONE. Because at that > state the driver doesn't really know what can be done to recover. > With that we can drop _MAX and use _NONE for bounding check. I think > we can also omit it in the notification as it's clear: > > WEDGED > DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET > > This means the driver can use any of these options to recover > > WEDGED > DRM_WEDGE_RECOVERY_BUS_RESET > > only bus reset would fix it > > WEDGED > > driver doesn't know anything that could fix it. It may be a soft-reboot, > hard-reboot, firmware flashing etc... We just don't know. With this I think we can drop sysfs. (Already too many ABIs to deal with) Raag