[Bug 198669] Driver crash at radeon_ring_backup+0xd3/0x140 [radeon]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=198669

--- Comment #12 from roger@xxxxxxxxxxxxxxxxxxxxx (roger@xxxxxxxxxxxxxxxxxxxxx) ---
On 7 February 2018 08:23:06 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=198669
>
> --- Comment #10 from Christian König (christian.koenig@xxxxxxx) ---
> (In reply to roger@xxxxxxxxxxxxxxxxxxxxx from comment #9)
>> The most likely cause of this kind of mechanical issue is the signal path
>> between the video interface hardware and the outside world, either a dry
>> joint or a mechanical fault in the cable or cable connectors.
>
> That is what I absolutely agree about.
>
>> The driver has sufficient
>> information to determine that a hard failure has occured, and that failure
>> is probably not in the gpu itself. I would like to see the driver doing a
>> hard reset of the card with rigorous error checking. If it cannot reset the
>> GPU in graphical mode it should try to set the display hardware into a basic
>> console mode.
>
> And that is the part you don't seem to understand. The driver is trying
> exactly
> what you are describing.
>
> We detect a problem because of a timeout, e.g. the hardware doesn't respond
> in
> a given time frame on commands we send to it.
>
> What we do then is to query the hardware how far we proceeded in the
> execution
> and the hardware answered with a nonsense value. In other words bits are set
> in
> the response which should never be set.
>
> This is a clear indicator that the PCIe transaction for the register read
> aborted because the device doesn't response any more.
>
> The most likely cause of that is that the bus interface in the ASIC locked up
> because of an electrical problem (I think the ESD protection kicked in) and
> the
> only way to get out of that is a hard reset of the system.
>
> What we can try to do is trying to prevent further failures like the crash
> you
> described by checking the values read from the hardware. This way you can at
> least access the box over the network or blindly shut it down with keyboard
> short cuts.


Yes, I take your point. I was speculating on insufficient information. My 
apologies. The solution you propose sounds great.

Thank you for your patience.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux