On Tue, Dec 18, 2012 at 5:36 AM, Christian König <deathsimple@xxxxxxxxxxx> wrote: > On 17.12.2012 22:31, Paul Bolle wrote: >> >> On an (outdated) laptop the radeon driver (almost always) prints, during >> the first resume of each session: >> [drm] crtc 1 is connected to a TV >> >> This message is a bit puzzling as, as far as I know, no TV has ever >> been connected to this laptop. Anyhow, before v3.5, if that happened the >> radeon driver then printed an error during all following resumes: >> [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35! >> >> (-35 is -EDEADLK.) But the resume would succeed and the driver seemed to >> run without too much trouble. From v3.5 onwards things changed. If the >> (puzzling) message about crtc 1 was printed on first resume the laptop >> would simply hang on second resume. Only a manual power off would then >> be possible. In that case nothing of interest would be found in the >> (truncated) logs. And, most annoyingly, the hang would never happen if >> the laptop was booted with, say, "console=ttyS0,115200n8" added to the >> kernel command line. >> >> I bisected the hang to commit 6c6f478370eccfbfafbdc6fc55c0def03e58f124 >> ("drm/radeon: rework recursive gpu reset handling"), which was added in >> the v3.5 release cycle. After discovering that and poking at the driver >> it turned out that this hang is triggered by radeon_cs_handle_lockup() >> returning -EAGAIN after successfully resetting the gpu. Simply returning >> 0 makes the hang disappear (and makes the drm error reappear). >> >> Nothing in the code or the commit explanation clarifies why -EAGAIN >> should be returned on successful gpu reset. So I suggest >> radeon_cs_handle_lockup() simply returns what radeon_gpu_reset() >> returns, eg 0 (on success) or a negative error code (on failure). >> >> Signed-off-by: Paul Bolle <pebolle@xxxxxxxxxx> >> --- >> 0) This exact patch is untested (but I run something comparable). >> >> 1) Sent as an RFC because I do not understand why this laptop (almost >> always) prints the "crtc 1" message on first resume. Note that another >> workaround for this hang is simply booting with "radeon.tv=0". > > Alex should probably take a look into this, since he probably is the one > with the deepest knowledge of the display engine. My best guess is that it > is just some error while probing for an attached TV and actually isn't so > bad after all. TV detection is always a bit flakey. I suspect some register or gpio related to TV out doesn't get restored properly on resume which results in a false positive. What asic is this? > > >> 2) Also sent as an RFC because I have no idea whatsoever why returning >> -EAGAIN will hang the machine. I guess it's returned to userland by >> radeon_cs_ioctl(). What code uses that ioctl? And what does that code do >> on -EAGAIN that hangs this laptop? > > > EAGAIN just tells userspace to reissue the requested system call. When a > system call is interrupted (either by a signal or in our case a GPU lockup) > it aborts and EGAIN is returned to userspace, telling userspace that it > should try again. So by just returning 0 userspace things that our system > call was executed and doesn't try it again. > > So you just prevented the normal reissuing of the system call and so also > prevented whatever this command submission should be doing in the first > place. > > >> 3) A third reason to send this as an RFC is that I also have no idea why >> this hang doesn't happen when booting with "console=ttyS0,115200n8" or >> even "console=tty0"! But I guess I'm now allowed to call this hang a >> Heisenbug. > > "Heisenbug" ? LOL, I need to remember that. > > But anyway it is not so unusual seeing a bug like this, cause it is possible > (but highly unlikely) that actually trying to print an error message can > cause a lockup. > > You should definitely try Alex latest drm-fixes-3.8 branch > (git://people.freedesktop.org/~agd5f/linux) since the possibility is quite > high that we already have fixed that bug. If that doesn't helps then please > open a bug report and leave me a note so that I can investigate further. I don't have a 3.8 fixes branch up just yet, but will soon. Alex > > Christian. > > > >> >> drivers/gpu/drm/radeon/radeon_cs.c | 5 +---- >> 1 files changed, 1 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c >> b/drivers/gpu/drm/radeon/radeon_cs.c >> index 41672cc..a302c00 100644 >> --- a/drivers/gpu/drm/radeon/radeon_cs.c >> +++ b/drivers/gpu/drm/radeon/radeon_cs.c >> @@ -486,11 +486,8 @@ out: >> static int radeon_cs_handle_lockup(struct radeon_device *rdev, int r) >> { >> - if (r == -EDEADLK) { >> + if (r == -EDEADLK) >> r = radeon_gpu_reset(rdev); >> - if (!r) >> - r = -EAGAIN; >> - } >> return r; >> } >> > > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel