Re: CX18 Oops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday 17 August 2008 04:13:24 Andy Walls wrote:
> On Mon, 2008-08-11 at 17:33 -0400, Brandon Jenkins wrote:
> > On Fri, Aug 8, 2008 at 10:18 AM, Andy Walls <awalls@xxxxxxxxx> 
wrote:
> > > Brandon,
> > >
> > > I have checked in a fix to defend against the Ooops we both
> > > encountered. The fix will also generate a WARN dump and some
> > > queue stats when it runs across the cause, but will otherwise try
> > > to clean up as best it can to allow further operation.
> > >
> > > The band-aid fix is the latest change at
> > >
> > > http://linuxtv.org/hg/~awalls/v4l-dvb
> > >
> > > Please provide the extra debug that happens if you encounter the
> > > warning in your logs.  I have only encountered the problem twice
> > > over a several month period, so its hard to get insight into the
> > > root cause buffer accounting error at that rate.
> >
> > Andy,
> >
> > I had an oops today, first one in a few days
> >
> > Brandon
>
> Brandon & Jeff,
>
> I have updated my repo at
>
> http://linuxtv.org/hg/~awalls/v4l-dvb
>
> with 3 changes:
>
> 1. Back out the original band aid fix
> 2. Simplify the queue flush routines (you will not see that oops
> again)

Nice! Cleans it up considerably.

> 3. Fix the interrupt handler to obtain a queue lock (prevents 
> queue corruption)

No, that is not the bug. I'm pretty sure that the real bug is that the 
old cx18_queue_move() function didn't use a spin_lock_irqsave(). I 
think that it was possible for an interrupt to arrive when the CPU was 
in the middle of a cx18_queue_move(). A spinlock in an interrupt 
handler is usually bogus (and that includes the one in the ivtv irq 
handler, I've just realized).

What I am uneasy about, though, is why an interrupt could arrive while 
in the cx18_queue_move() function. In principle this function should 
only be called when the capture has stopped. I think it might be a good 
idea to debug this: is it possible for interrupts to arrive after the 
capture was stopped? Or is it possible for cx18_queue_move() to be 
called when a capture is still in progress?

I do think it is a good idea to rename cx18_queue_find_buf to 
cx18_queue_get_buf_irq to denote that it 1) not just finds a buffer but 
also removes it from the queue, and 2) it can only be called safely 
from interrupt context.

> >From most of the output you provided, it was pretty obvious that
> > q_full
>
> was always claiming to have more buffers that it actually did.  I
> hypothesized this could come about at the end of a capture when the
> encoder hadn't really stopped transferring buffers yet (after we told
> it to stop) and then we try to clear q_full while the interrupt
> handler is still trying to add buffers.  This could happen because
> the interrupt handler never (ever) properly obtained a lock for
> manipulating the queues.  This could have been causing the queue
> corruption.
>
> Please test.  I need feedback that I haven't introduced a deadlock.
>
> It also appears that the last change requiring the interrupt handler
> to obtain a lock, completely mitigates me having to use the "-cache
> 8192" option to mplayer for digital captures, and greatly reduces the
> amount of cache I need to have mplayer use for analog captures.

I suspect that it is the change before that one: adding a spinlock to 
cx18_queue_move(). The spinlock in the interrupt handler doesn't do 
anything. It would only be useful if you could have two independent 
interrupt handlers that both needed access to that resource. But that 
is not the case here.

>
>
> Hans,
> (or anyone else with expertise in using spinlocks withing an
> interrupt handler),
>
> Could you please provide comments on if I'm doing something wrong
> with the way I obtain the spinlock in the interrupt handler?

See above :-)

>
> http://linuxtv.org/hg/~awalls/v4l-dvb/rev/f3ada35200c0
>
> >From reading Bovet and Cesati's _Understanding_the_Linux_Kernel_ and
>
> Corbet, Rubini, and Kroah-Hartman's _Linux_Device_Drivers_ I think
> I've got it right.
>
> When the stream queues (q_full, q_io, and q_free) are accessed from
> the system call exception handler, I need to do a spin_lock_irqsave()
> to disable local CPU interrupts and protect access to the queues by
> kernel control paths on other CPU's.  When they stream queues are
> accessed by the interrupt handler on any CPU, the interrupt handler
> is serialized with respect to itself and need not disable any
> interrupts and simply obtain the lock via spin_lock() to protect
> against access from system call exceptions.

System call exceptions? Not sure what you mean. AFAIK the interrupt 
handler doesn't have to protect against anything.

Regards,

	Hans

>
>
> Regards,
> Andy
>
> > [35446.681402] WARNING: at /root/hdpvr/v4l/cx18-queue.c:204
> > cx18_queue_move+0x207/0x220 [cx18]()
> > [35446.681402] Modules linked in: nls_cp437 cifs binfmt_misc
> > iptable_filter ip_tables x_tables xfs loop mxl5005s s5h1409
> > tuner_simple tuner_types ipv6 cs5345 tuner cx18 dvb_core
> > compat_ioctl32 usbhid videodev v4l1_compat hid i2c_algo_bit cx2341x
> > v4l2_common tveeprom psmouse i2c_core button ext3 jbd mbcache
> > sd_mod ahci iTCO_wdt libata r8169 scsi_mod dock ehci_hcd uhci_hcd
> > usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1
> > raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod
> > thermal processor fan fuse [last unloaded: hdpvr]
> > [35446.681402] Pid: 14030, comm: java Not tainted
> > 2.6.26-server-sagetv #1 [35446.681402]
> > [35446.681402] Call Trace:
> > [35446.681402]  [<ffffffff80238c24>] warn_on_slowpath+0x64/0xa0
> > [35446.681402]  [<ffffffffa01efd47>] :cx18:cx18_vapi+0xa7/0x110
> > [35446.681402]  [<ffffffff8044901f>] _spin_lock_irqsave+0x1f/0x50
> > [35446.681402]  [<ffffffff804493f2>]
> > _spin_unlock_irqrestore+0x12/0x40 [35446.681402] 
> > [<ffffffff8044901f>] _spin_lock_irqsave+0x1f/0x50 [35446.681402] 
> > [<ffffffffa01eb617>] :cx18:cx18_queue_move+0x207/0x220
> > [35446.681402]  [<ffffffffa01eca58>]
> > :cx18:cx18_release_stream+0x78/0xc0 [35446.681402] 
> > [<ffffffffa01ecef6>] :cx18:cx18_v4l2_close+0xb6/0x150
> > [35446.681402]  [<ffffffff802a88d1>] __fput+0xb1/0x1d0
> > [35446.681402]  [<ffffffff802a5404>] filp_close+0x54/0x90
> > [35446.681402]  [<ffffffff802a6c1f>] sys_close+0x9f/0x110
> > [35446.681402]  [<ffffffff80226c02>] sysenter_do_call+0x1b/0x66
> > [35446.681402]
> > [35446.681402] ---[ end trace 5fa5ceee62929416 ]---
> > [35446.681402] cx18-2: queue_move: driver bug! errant steal attempt
> > for to/from_free queue move, dumping queue stats
> > [35446.681402] cx18-2: queue_move: thought bytes_available = 196608
> > with needed = 196608 and initial destination size = 3866624
> > [35446.681402] cx18-2: queue_log: stream 'encoder MPEG'  buffers =
> > 63 buf_size = 65536  buffers_stolen = 0
> > [35446.681402] cx18-2: queue_log: &q_free = ffff81021d7901f8 
> > &q_full = ffff81021d790218  &q_io = ffff81021d790238
> > [35446.681402] cx18-2: queue_log: q = ffff81021d790218   buffers =
> > 1 length = 65536  bytesused = 43008
> > [35446.681402] cx18-2: queue_log: stream 'encoder MPEG'  buffers =
> > 63 buf_size = 65536  buffers_stolen = 0
> > [35446.681402] cx18-2: queue_log: &q_free = ffff81021d7901f8 
> > &q_full = ffff81021d790218  &q_io = ffff81021d790238
> > [35446.681402] cx18-2: queue_log: q = ffff81021d7901f8   buffers =
> > 62 length = 3997696  bytesused = 0
> > [35446.681402] cx18-2: queue_log: i = 0  buf->id = 39 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 1  buf->id = 40 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 2  buf->id = 41 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 3  buf->id = 42 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 4  buf->id = 43 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 5  buf->id = 44 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 6  buf->id = 45 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 7  buf->id = 46 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 8  buf->id = 47 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 9  buf->id = 48 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 10  buf->id = 49 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 11  buf->id = 50 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 12  buf->id = 51 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 13  buf->id = 52 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 14  buf->id = 53 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 15  buf->id = 54 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 16  buf->id = 55 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 17  buf->id = 56 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 18  buf->id = 57 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 19  buf->id = 58 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 20  buf->id = 59 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 21  buf->id = 60 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 22  buf->id = 61 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 23  buf->id = 62 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 24  buf->id = 0 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 25  buf->id = 1 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 26  buf->id = 2 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 27  buf->id = 3 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 28  buf->id = 4 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 29  buf->id = 5 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 30  buf->id = 6 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 31  buf->id = 7 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 32  buf->id = 8 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 33  buf->id = 9 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 34  buf->id = 10 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 35  buf->id = 11 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 36  buf->id = 12 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 37  buf->id = 13 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 38  buf->id = 14 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 39  buf->id = 15 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 40  buf->id = 16 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 41  buf->id = 17 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 42  buf->id = 18 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 43  buf->id = 19 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 44  buf->id = 20 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 45  buf->id = 21 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 46  buf->id = 22 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 47  buf->id = 23 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 48  buf->id = 24 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 49  buf->id = 25 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 50  buf->id = 26 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 51  buf->id = 27 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 52  buf->id = 28 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 53  buf->id = 29 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 54  buf->id = 30 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 55  buf->id = 32 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 56  buf->id = 33 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 57  buf->id = 34 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 58  buf->id = 35 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 59  buf->id = 36 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 60  buf->id = 37 
> > buf->bytesused = 0  buf->readpos = 0
> > [35446.681402] cx18-2: queue_log: i = 61  buf->id = 38 
> > buf->bytesused = 0  buf->readpos = 0
> >
> > --
> > video4linux-list mailing list
> > Unsubscribe
> > mailto:video4linux-list-request@xxxxxxxxxx?subject=unsubscribe
> > https://www.redhat.com/mailman/listinfo/video4linux-list



_______________________________________________
linux-dvb mailing list
linux-dvb@xxxxxxxxxxx
http://www.linuxtv.org/cgi-bin/mailman/listinfo/linux-dvb

[Index of Archives]     [Linux Media]     [Video 4 Linux]     [Asterisk]     [Samba]     [Xorg]     [Xfree86]     [Linux USB]

  Powered by Linux