https://bugzilla.kernel.org/show_bug.cgi?id=198669 --- Comment #12 from roger@xxxxxxxxxxxxxxxxxxxxx (roger@xxxxxxxxxxxxxxxxxxxxx) --- On 7 February 2018 08:23:06 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=198669 > > --- Comment #10 from Christian König (christian.koenig@xxxxxxx) --- > (In reply to roger@xxxxxxxxxxxxxxxxxxxxx from comment #9) >> The most likely cause of this kind of mechanical issue is the signal path >> between the video interface hardware and the outside world, either a dry >> joint or a mechanical fault in the cable or cable connectors. > > That is what I absolutely agree about. > >> The driver has sufficient >> information to determine that a hard failure has occured, and that failure >> is probably not in the gpu itself. I would like to see the driver doing a >> hard reset of the card with rigorous error checking. If it cannot reset the >> GPU in graphical mode it should try to set the display hardware into a basic >> console mode. > > And that is the part you don't seem to understand. The driver is trying > exactly > what you are describing. > > We detect a problem because of a timeout, e.g. the hardware doesn't respond > in > a given time frame on commands we send to it. > > What we do then is to query the hardware how far we proceeded in the > execution > and the hardware answered with a nonsense value. In other words bits are set > in > the response which should never be set. > > This is a clear indicator that the PCIe transaction for the register read > aborted because the device doesn't response any more. > > The most likely cause of that is that the bus interface in the ASIC locked up > because of an electrical problem (I think the ESD protection kicked in) and > the > only way to get out of that is a hard reset of the system. > > What we can try to do is trying to prevent further failures like the crash > you > described by checking the values read from the hardware. This way you can at > least access the box over the network or blindly shut it down with keyboard > short cuts. Yes, I take your point. I was speculating on insufficient information. My apologies. The solution you propose sounds great. Thank you for your patience. -- You are receiving this mail because: You are watching the assignee of the bug. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel