Re: Debugging Thinkpad T430s occasional suspend failure.

Hugh Dickins <hughd@xxxxxxxxxx> · Sat, 16 Feb 2013 16:01:19 -0800 (PST)

On Sat, 16 Feb 2013, Linus Torvalds wrote:
> On Sat, Feb 16, 2013 at 1:45 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> >
> > I hacked around on your PM_TRACE set_magic_time() / read_magic_time()
> > yesterday, to save an oopsing core kernel ip there, instead of hashed
> > pm trace info (it makes sense in this case to invert your sequence,
> > putting the high order into years and the low order into minutes).
> 
> That sounds like a good idea in general. The PM_TRACE() thing was done
> to figure out things that locked up the PCI bus etc, but encoding the
> oopses during suspend sounds like a really good idea too.

Yes, it can and should be generalized, but needs more than I gave it.

> 
> Is your patch clean enough to just be made part of the standard
> PM_TRACE infrastructure, or was it something really hacky and one-off?

Not really hacky, but three reasons why it cannot be standard without
more work (I am supposing that it should not be tied to suspend/resume,
but could be used for any oops which gets hidden by graphic screen,
and also fails to reach /var/log/messages).

1. Because I usually have a version of KDB patched into my kernels
("forked" many years ago, never time to integrate with that subset
which eventually went into 2.6.??), it was easiest to implement as
a pair of KDB commands (needing keyboard input: I already knew I
could "reboot" from KDB in this state).  (Sidenote: using KDB can
prevent getting a trace into /var/log/messages: so I had tried
without it, but still failed to get one.)

2. I don't use initrd, have very little in modules, and a pared down
kernel: so for me it was not a serious restriction to limit to core
kernel addresses, which easily fitted within the limit; but we ought
to put thought into handling module addresses too.

3. Needs care on the interface: a debug config option presumably,
but perhaps also needs to tie in with panic_on_oops or something -
I don't like my date getting messed up by surprise, and there's no
point in messing it up unless there's reason to believe the machine
will very quickly be rebooted.  Since I had to open the lid to resume
to hit the oops, in this case we could rely on me quickly rebooting.

But I've appended what I had below, so it's on the record, and can
be taken further if (likely) someone gets there sooner than I do.

> 
> > Rewarded last night by reboot to Feb 21 14:45:53 2006.  Which is
> > ffffffff812d60ed intel_choose_pipe_bpp_dither.isra.13+0x216/0x2d6
> >
> > /home/hugh/3087X/drivers/gpu/drm/i915/intel_display.c:4159
> >          * enable dithering as needed, but that costs bandwidth.  So choose
> >          * the minimum value that expresses the full color range of the fb but
> >          * also stays within the max display bpc discovered above.
> >          */
> >
> >         switch (fb->depth) {
> > ffffffff812d60e9:       48 8b 55 c0             mov    -0x40(%rbp),%rdx
> > ffffffff812d60ed:       8b 02                   mov    (%rdx),%eax
> >
> > (gcc chose to pass a pointer to fb->depth down to the function,
> > instead of fb itself, since that is the only use of it there.)
> >
> > I expect that fb is NULL; but with an average of one failure to resume
> > per day, and ~26 bits of info per crash, this is not a fast procedure!
> >
> > I notice that intel_pipe_set_base() allows for NULL fb,
> > so I'm currently running with the oops-in-rtc hackery, plus
> > -       switch (fb->depth) {
> > +       if (WARN_ON(!fb))
> > +               bpc = 8;
> > +       else switch (fb->depth) {
> >
> > There's been a fair bit of change to intel_display.c since 3.7 (if
> > my 3.7 was indeed good), mainly splitting intel_ into haswell_ versus
> > ironlake_, but I've not yet spotted anything obvious; nor yet looked
> > to see where fb would originate from anyway.
> >
> > Once I've got just a little more info out of it, I'll start another
> > thread addressed principally to the drm/gpu/i915 guys.
> 
> I think it's worth it to give them a heads-up already. So I've cc'd
> the main suspects here..

Okay, thanks.

> 
> Daniel, Dave - any comments about a NULL fb in
> intel_choose_pipe_bpp_dither() during either suspend or resume? Some
> googling shows this:
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=895123

Great, yes, I'm sure that's the same (though it says "suspend"
and I say "resume").

> 
> which sounds remarkably similar, and is also during a suspend attempt
> (but apparently Satish got a full oops out).. Some timing race with a
> worker entry?
> 
>                         Linus

#include <linux/rtc.h>
#include <asm/rtc.h>
/*
 * HughD adapted the following from drivers/base/power/trace.c:
 *
 * Copyright (C) 2006 Linus Torvalds
 *
 * Trace facility for suspend/resume problems, when none of the
 * devices may be working.
 *
 * Horrid, horrid, horrid.
 *
 * It turns out that the _only_ piece of hardware that actually
 * keeps its value across a hard boot (and, more importantly, the
 * POST init sequence) is literally the realtime clock.
 *
 * Never mind that an RTC chip has 114 bytes (and often a whole
 * other bank of an additional 128 bytes) of nice SRAM that is
 * _designed_ to keep data - the POST will clear it. So we literally
 * can just use the few bytes of actual time data, which means that
 * we're really limited.
 *
 * It means, for example, that we can't use the seconds at all
 * (since the time between the hang and the boot might be more
 * than a minute).
 *
 * There are the wday fields etc, but I wouldn't guarantee those
 * are dependable either. And if the date isn't valid, either the
 * hw or POST will do strange things.
 *
 * So we're left with:
 *  - year: 0-99
 *  - month: 0-11
 *  - day-of-month: 1-28
 *  - hour: 0-23
 *  - min: 0-59
 *
 * Giving us a total range of 0-48384000 (0x2e24800), ie less
 * than 26 bits of actual data we can save across reboots:
 * but that should be enough to save one core kernel address.
 */

#define LIMITING_VALUE (100*12*28*24*60)
static unsigned int saved_value;

static void set_magic_time(unsigned int n)
{
	// June 7th, 2006
	static struct rtc_time time = {
		.tm_sec = 0,
		.tm_min = 0,
		.tm_hour = 0,
		.tm_mday = 7,
		.tm_mon = 5,	// June - counting from zero
		.tm_year = 106,
		.tm_wday = 3,
		.tm_yday = 160,
		.tm_isdst = 1
	};

	time.tm_min = (n % 60);
	n /= 60;
	time.tm_hour = (n % 24);
	n /= 24;
	time.tm_mday = (n % 28) + 1;
	n /= 28;
	time.tm_mon = (n % 12);
	n /= 12;
	time.tm_year = (n % 100);
	set_rtc_time(&time);
}

static unsigned int __init read_magic_time(void)
{
	struct rtc_time time;
	unsigned int n;

	get_rtc_time(&time);
	n = time.tm_year;
	if (n >= 100)
		n -= 100;
	n *= 12;
	n += time.tm_mon;
	n *= 28;
	n += time.tm_mday - 1;
	n *= 24;
	n += time.tm_hour;
	n *= 60;
	n += time.tm_min;
	return n;
}

static int kdba_rtcw(int argc, const char **argv)
{
	unsigned long addr;

	if (!kdb_current_regs)
		return KDB_BADADDR;
	addr = kdb_current_regs->ip;
	if (addr < (unsigned long)_stext ||
	    addr > (unsigned long)_etext)
		return KDB_BADADDR;
	if (addr - (unsigned long)_stext >= LIMITING_VALUE)
		return KDB_BADADDR;
	saved_value = addr - (unsigned long)_stext;
	set_magic_time(saved_value);
	return 0;
}

static int kdba_rtcr(int argc, const char **argv)
{
	int diag = KDB_BADADDR;
	unsigned long addr;

	addr = saved_value + (unsigned long)_stext;
	if (addr < (unsigned long)_stext ||
	    addr > (unsigned long)_etext)
		return diag;
	kdb_printf("%*p %pB\n",
			2 * (int)sizeof(long), (void *)addr, (void *)addr);
	return 0;
}

void __init kdba_rtcinit(void)
{
	saved_value = read_magic_time();
	kdb_register("rtcw", kdba_rtcw, "", "Write ip to rtc", 0);
	kdb_register("rtcr", kdba_rtcr, "", "Read ip from rtc", 0);
}
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel