On 22/11/24 21:32, Raag Jadav wrote: > On Fri, Nov 22, 2024 at 11:09:32AM +0100, Christian König wrote: >> Am 22.11.24 um 08:07 schrieb Raag Jadav: >>> On Mon, Nov 18, 2024 at 08:26:37PM +0530, Aravind Iddamsetty wrote: >>>> On 15/11/24 10:37, Raag Jadav wrote: >>>>> Introduce device wedged event, which notifies userspace of 'wedged' >>>>> (hanged/unusable) state of the DRM device through a uevent. This is >>>>> useful especially in cases where the device is no longer operating as >>>>> expected and has become unrecoverable from driver context. Purpose of >>>>> this implementation is to provide drivers a generic way to recover with >>>>> the help of userspace intervention without taking any drastic measures >>>>> in the driver. >>>>> >>>>> A 'wedged' device is basically a dead device that needs attention. The >>>>> uevent is the notification that is sent to userspace along with a hint >>>>> about what could possibly be attempted to recover the device and bring >>>>> it back to usable state. Different drivers may have different ideas of >>>>> a 'wedged' device depending on their hardware implementation, and hence >>>>> the vendor agnostic nature of the event. It is up to the drivers to >>>>> decide when they see the need for recovery and how they want to recover >>>>> from the available methods. >>>>> >>>>> Prerequisites >>>>> ------------- >>>>> >>>>> The driver, before opting for recovery, needs to make sure that the >>>>> 'wedged' device doesn't harm the system as a whole by taking care of the >>>>> prerequisites. Necessary actions must include disabling DMA to system >>>>> memory as well as any communication channels with other devices. Further, >>>>> the driver must ensure that all dma_fences are signalled and any device >>>>> state that the core kernel might depend on are cleaned up. Once the event >>>>> is sent, the device must be kept in 'wedged' state until the recovery is >>>>> performed. New accesses to the device (IOCTLs) should be blocked, >>>>> preferably with an error code that resembles the type of failure the >>>>> device has encountered. This will signify the reason for wegeding which >>>>> can be reported to the application if needed. >>>> should we even drop the mmaps we created? >>> Whatever is required for a clean recovery, yes. >>> >>> Although how would this play out? Do we risk loosing display? >>> Or any other possible side-effects? >> Before sending a wedge event all DMA transfers of the device have to be >> blocked. >> >> So yes, all display, mmap() and file descriptor connections you had with the >> device would need to be re-created. > Does it mean we'd have to rely on userspace to unmap()? I'm not sure of display, but at least all user mappings can be destroyed using drm_vma_node_unmap. Thanks, Aravind. > > Raag