Re: saa7134: race between device initialization and first interrupt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2009-01-12 at 00:54 +0100, hermann pitton wrote:
> Hi!
> 
> Am Samstag, den 10.01.2009, 22:05 -0500 schrieb Andy Walls:
> > On Sun, 2009-01-11 at 03:32 +0100, hermann pitton wrote:
> > > Hi,
> > > 
> > > Am Samstag, den 10.01.2009, 07:37 -0500 schrieb Andy Walls:
> > > > On Sat, 2009-01-10 at 13:02 +0100, Marcin Slusarz wrote:
> > > > > On Fri, Jan 09, 2009 at 09:48:07PM -0500, Andy Walls wrote:
> > > > > > On Thu, 2009-01-08 at 00:50 -0200, Mauro Carvalho Chehab wrote:
> > > > > > > On Sun, 4 Jan 2009 22:57:42 +0100
> > > > > > > Marcin Slusarz <marcin.slusarz@xxxxxxxxx> wrote:
> > > > > > > 
> > > > > > > > Hi
> > > > > > > > There's a race between saa7134 device initialization and first interrupt
> > > > > > > > handler, which manifests as an oops [1].
> > > > > > > > 
> > > > > > > > saa7134_initdev -> request_irq -> saa7134_irq ->
> > > > > > > > saa7134_irq_video_signalchange -> saa7134_tvaudio_setmute -> mute_input_7133
> > > > > > > > 
> > > > > > > > In mute_input_7133 dev->input == NULL and accessing dev->input->amux will oops,
> > > > > > > > because dev->input would be initialized later:
> > > > > > > > 
> > > > > > > > saa7134_initdev -> saa7134_hwinit2 -> saa7134_video_init2 -> video_mux ->
> > > > > > > > saa7134_tvaudio_setinput
> > > > > > > > 
> > > > > > > > I'm not sure how it should be fixed correctly, but one of attached patches
> > > > > > > > should fix the symptom.
> > > > > > > > 
> > > > > > > > Marcin
> > > >  
> > > > > > Marcin,
> > > > > > 
> > > > > > What devices share the interrupt with the SAA7133?
> > > > > > 
> > > > > > $ cat /proc/interrupts
> > > > > > 
> > > > > > And could on of those those devices possibly generate an interrupt while
> > > > > > the saa7134 driver is being modporbe'd?
> > > > > 
> > > > > I don't have this hardware so I can't tell. I've picked random oops from
> > > > > kerneloops.org and tried to fix it.
> > > > 
> > > > Ah.  Good man.
> > > > 
> > > > 
> > > > > > 
> > > > > > Mauro & Marcin,
> > > > > > 
> > > > > > This looks like to me what is going on:
> > > > > > 
> > > > > > saa7134_hwinit1() does properly turn off IRQs for which it wants
> > > > > > reports:
> > > > > > 
> > > > > > 	saa_writel(SAA7134_IRQ1, 0);
> > > > > >         saa_writel(SAA7134_IRQ2, 0);
> > > > > > 
> > > > > > but not clearing the state of SAA7134_IRQ_REPORT, maybe something like
> > > > > > this (I don't have a SAA7134 datasheet):
> > > > > > 
> > > > > > 	saa_writel(SAA7134_IRQ_REPORT, 0xffffffff);
> > > > > > 
> > > > > > So when saa7134_initdev() calls request_irq(..., saa7134_irq,
> > > > > > IRQF_SHARED | IRQF_DISABLED, ...), it gets an IRQ that is shared with
> > > > > > another device.
> > > > > > 
> > > > > > Before saa7134_hwinit2() is called by saa7134_initdev() to set
> > > > > > "dev->input", some other device fires an interrupt and saa7134_irq() is
> > > > > > called that then operates on the unknown state of the SAA7134_IRQ_REPORT
> > > > > > register that was never cleared.
> > > > > 
> > > > > Sounds good. But I think this register should be set to 0, because in 
> > > > > saa7134_irq, we do:
> > > > > 
> > > > > report = saa_readl(SAA7134_IRQ_REPORT);
> > > > > (...)
> > > > > if ((report & SAA7134_IRQ_REPORT_RDCAP) || (report & SAA7134_IRQ_REPORT_INTL))
> > > > > 	saa7134_irq_video_signalchange(dev);
> > > > > 
> > > > > But I'm not v4l expert, so...
> > > > 
> > > > With most chip interrupt status registers, you write the flags you want
> > > > to clear.
> > > > 
> > > > That way you can always do something like:
> > > > 
> > > > 	a = read(ISR_REG);
> > > > 	write(ISR_REG,a);
> > > > 
> > > > to clear the Interrupt status register.
> > > > 
> > > > Note what saa7134_irq() does:
> > > > 
> > > >                 report = saa_readl(SAA7134_IRQ_REPORT);
> > > > 		[...]
> > > >                 saa_writel(SAA7134_IRQ_REPORT,report);
> > > > 		[...]
> > > >                 if ((report & SAA7134_IRQ_REPORT_RDCAP) ||
> > > >                         (report & SAA7134_IRQ_REPORT_INTL))
> > > >                                 saa7134_irq_video_signalchange(dev);
> > > > 
> > > > 
> > > > So I'm thinking writing 0 to the register won't have the desired effect.
> > > > 
> > > > Again, thanks for taking the initiative.
> > > > 
> > > > Regards,
> > > > Andy
> > > > 
> > > > > > Marcin has mapped out the oops from there.
> > > > > > 
> > > > > > So the solution, I'm guessing, is likely to clear the SAA7134_IRQ_REPORT
> > > > > > register in saa7134_hwinit1().
> > > > > > 
> > > > > > If only I had a datasheet, hardware, spare time... ;)
> > > > > > 
> > > 
> > > Hartmut has the register programming instructions under NDA and some
> > > others obviously too.
> > > 
> > > Must have been very lucky all that time ...
> > 
> > If it makes the scenario seem more plausible, Marcin pointed out a
> > website where this Oops has been reported 22 times:
> > 
> > http://kerneloops.org/guilty.php?guilty=mute_input_7133&version=2.6.27-release&start=1802240&end=1835007&class=oops
> > 
> > 
> > My suggested fix is this:
> > 
> > http://linuxtv.org/hg/~awalls/cx88/rev/a28c39659c25
> > 
> > which should do no harm, even if it doesn't fix the Oops.
> > 
> 
> Andy, I allowed shared interrupts on several machines with multiple
> saa713x cards and current v4l-dvb, but I'm still not able to reproduce
> the oops. So can't try the fix. Who has the oops and can try?

I don't know.  Marcin simply pointed it out on the list.  He doesn't
have the hardware.

> Hmm, the reports all come from Fedora 9 or 10 so far and they seem to
> have PCI down ports from 2.6.28 on those kernels.
> 
> My cards also all come up with the tuner as the default input and should
> not even check for a signal on other inputs during loading or even see a
> change there, IIRC.
> 
> Disconnecting the video signal on other inputs triggers the mute on
> saa7134 chips, but does nothing on saa7133/35/31e and that is the last
> status I remember. (oh my ;)

The question is: what is the state of the SAA7134_IRQ_REPORT register
when the saa7134 module is being (re-)loaded?.

What will the saa7134_irq() handler do after saa7134_initdev() calls
request_irq(), if the saa7134 device shares an interrupt with a high
interrupt frequency device (ethernet, disk, or graphics controller)?


> Seems testing alone doesn't help on a first try and I'm not that great
> in oops reading ...

All the oops reported at that site that I have read fail in
saa7134-tvaudio.c:mute_input_7133().  Here's the code where the oops
happens:

static int mute_input_7133(struct saa7134_dev *dev)
{
        u32 reg = 0;
        u32 xbarin, xbarout;  
        int mask;
        struct saa7134_input *in;

        xbarin = 0x03;
        switch (dev->input->amux) {
        case TV:
                reg = 0x02;
                xbarin = 0;
                break;
        case LINE1:
                reg = 0x00;
                break; 
        case LINE2:
        case LINE2_LEFT:
                reg = 0x09;
                break;
        }
	[...]

Here's the backtrace:

EIP is at mute_input_7133+0xe/0xf0 [saa7134]
          ^^^^^^^^^^^^^^^           ^^^^^^^

EAX: 00000000 EBX: f39cc000 ECX: 000000a0 EDX: f39cc000
     ^^^^^^^^
	NULL

ESI: f39cc000 EDI: 00000040 EBP: f39a3dd0 ESP: f39a3dc4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process modprobe (pid: 1321, ti=f39a3000 task=f39619a0 task.ti=f39a3000)
Stack: 000000ff f39cc000 00000040 f39a3dd8 f894884b f39a3df0 f894bfef f39a3e18 
      f39cc000 ffffffff ffffffff f39a3e10 f89468ab f39cc0f4 00000000 00000000 
      00000202 f3870960 f89467d9 f39a3e30 c04638bc 000000a0 00000012 01000020 
Call Trace:
[<f894884b>] ? saa7134_tvaudio_setmute+0x3d/0x3f [saa7134]
[<f894bfef>] ? saa7134_irq_video_signalchange+0xb9/0x11b [saa7134]  <=== irq handler decided to take action
[<f89468ab>] ? saa7134_irq+0xd2/0x25f [saa7134]
[<f89467d9>] ? saa7134_irq+0x0/0x25f [saa7134]         <==== irq handler was called
[<c04638bc>] ? request_irq+0xbb/0x10c                  <==== request_irq() when device is being initialized
[<f894d89e>] ? saa7134_initdev+0x543/0x905 [saa7134]
[...]

(note addresses are offset: mute_input_7133 starts at 0x1d) 
  1d:	55                   	push   %ebp
  1e:	89 e5                	mov    %esp,%ebp   
  20:	57                   	push   %edi  
  21:	56                   	push   %esi
  22:	53                   	push   %ebx
  23:	89 c3                	mov    %eax,%ebx
  25:	8b 80 cc 06 00 00    	mov    0x6cc(%eax),%eax  %eax holds "dev", move "dev->input" into %eax
  2b:*	8b 40 08             	mov    0x8(%eax),%eax    %eax holds "dev->input", move "dev->input->amux" into %eax
							 ^^^^ Ooops happens here because EAX is 00000000
  2e:	83 f8 01             	cmp    $0x1,%eax	"case TV:"
  31:	74 0c                	je     0x3f		jump to "TV" case code, if equal
  33:	72 13                	jb     0x48		break out of switch if less than "TV"
  35:	83 e8 03             	sub    $0x3,%eax
  38:	83 f8 01             	cmp    $0x1,%eax
  3b:	77 0b                	ja     0x48		break out of switch if greater than "LINE2_LEFT"
  3d:	eb 0d                	jmp    0x4c
  3f:	be                   	.byte 0xbe


So "dev->input" is NULL when saa7134-tvaudio.c:mute_input_7133() is
called and that is what causes the Oops.  It was called by the
saa7134_irq() handler trying to take action, shortly after request_irq()
was called.  Can you think of why this would happen?


I don't know.  If no one here can test it and confirm a fix, maybe we
just forget about it until someone with hardware complains?


Regards,
Andy

> Thanks,
> Hermann


--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]
  Powered by Linux