Re: CX18 Oops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2008-08-11 at 17:33 -0400, Brandon Jenkins wrote:
> On Fri, Aug 8, 2008 at 10:18 AM, Andy Walls <awalls@xxxxxxxxx> wrote:
> > Brandon,
> >
> > I have checked in a fix to defend against the Ooops we both encountered.
> > The fix will also generate a WARN dump and some queue stats when it runs
> > across the cause, but will otherwise try to clean up as best it can to
> > allow further operation.
> >
> > The band-aid fix is the latest change at
> >
> > http://linuxtv.org/hg/~awalls/v4l-dvb
> >
> > Please provide the extra debug that happens if you encounter the warning
> > in your logs.  I have only encountered the problem twice over a several
> > month period, so its hard to get insight into the root cause buffer
> > accounting error at that rate.
> 
> Andy,
> 
> I had an oops today, first one in a few days
> 
> Brandon

Brandon & Jeff,

I have updated my repo at 

http://linuxtv.org/hg/~awalls/v4l-dvb

with 3 changes:

1. Back out the original band aid fix
2. Simplify the queue flush routines (you will not see that oops again)
3. Fix the interrupt handler to obtain a queue lock (prevents queue
corruption)

>From most of the output you provided, it was pretty obvious that q_full
was always claiming to have more buffers that it actually did.  I
hypothesized this could come about at the end of a capture when the
encoder hadn't really stopped transferring buffers yet (after we told it
to stop) and then we try to clear q_full while the interrupt handler is
still trying to add buffers.  This could happen because the interrupt
handler never (ever) properly obtained a lock for manipulating the
queues.  This could have been causing the queue corruption.

Please test.  I need feedback that I haven't introduced a deadlock.

It also appears that the last change requiring the interrupt handler to
obtain a lock, completely mitigates me having to use the "-cache 8192"
option to mplayer for digital captures, and greatly reduces the amount
of cache I need to have mplayer use for analog captures. 


Hans,
(or anyone else with expertise in using spinlocks withing an interrupt
handler),

Could you please provide comments on if I'm doing something wrong with
the way I obtain the spinlock in the interrupt handler?

http://linuxtv.org/hg/~awalls/v4l-dvb/rev/f3ada35200c0

>From reading Bovet and Cesati's _Understanding_the_Linux_Kernel_ and
Corbet, Rubini, and Kroah-Hartman's _Linux_Device_Drivers_ I think I've
got it right.

When the stream queues (q_full, q_io, and q_free) are accessed from the
system call exception handler, I need to do a spin_lock_irqsave() to
disable local CPU interrupts and protect access to the queues by kernel
control paths on other CPU's.  When they stream queues are accessed by
the interrupt handler on any CPU, the interrupt handler is serialized
with respect to itself and need not disable any interrupts and simply
obtain the lock via spin_lock() to protect against access from system
call exceptions.


Regards,
Andy


> [35446.681402] WARNING: at /root/hdpvr/v4l/cx18-queue.c:204
> cx18_queue_move+0x207/0x220 [cx18]()
> [35446.681402] Modules linked in: nls_cp437 cifs binfmt_misc
> iptable_filter ip_tables x_tables xfs loop mxl5005s s5h1409
> tuner_simple tuner_types ipv6 cs5345 tuner cx18 dvb_core
> compat_ioctl32 usbhid videodev v4l1_compat hid i2c_algo_bit cx2341x
> v4l2_common tveeprom psmouse i2c_core button ext3 jbd mbcache sd_mod
> ahci iTCO_wdt libata r8169 scsi_mod dock ehci_hcd uhci_hcd usbcore
> raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0
> multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal
> processor fan fuse [last unloaded: hdpvr]
> [35446.681402] Pid: 14030, comm: java Not tainted 2.6.26-server-sagetv #1
> [35446.681402]
> [35446.681402] Call Trace:
> [35446.681402]  [<ffffffff80238c24>] warn_on_slowpath+0x64/0xa0
> [35446.681402]  [<ffffffffa01efd47>] :cx18:cx18_vapi+0xa7/0x110
> [35446.681402]  [<ffffffff8044901f>] _spin_lock_irqsave+0x1f/0x50
> [35446.681402]  [<ffffffff804493f2>] _spin_unlock_irqrestore+0x12/0x40
> [35446.681402]  [<ffffffff8044901f>] _spin_lock_irqsave+0x1f/0x50
> [35446.681402]  [<ffffffffa01eb617>] :cx18:cx18_queue_move+0x207/0x220
> [35446.681402]  [<ffffffffa01eca58>] :cx18:cx18_release_stream+0x78/0xc0
> [35446.681402]  [<ffffffffa01ecef6>] :cx18:cx18_v4l2_close+0xb6/0x150
> [35446.681402]  [<ffffffff802a88d1>] __fput+0xb1/0x1d0
> [35446.681402]  [<ffffffff802a5404>] filp_close+0x54/0x90
> [35446.681402]  [<ffffffff802a6c1f>] sys_close+0x9f/0x110
> [35446.681402]  [<ffffffff80226c02>] sysenter_do_call+0x1b/0x66
> [35446.681402]
> [35446.681402] ---[ end trace 5fa5ceee62929416 ]---
> [35446.681402] cx18-2: queue_move: driver bug! errant steal attempt
> for to/from_free queue move, dumping queue stats
> [35446.681402] cx18-2: queue_move: thought bytes_available = 196608
> with needed = 196608 and initial destination size = 3866624
> [35446.681402] cx18-2: queue_log: stream 'encoder MPEG'  buffers = 63
> buf_size = 65536  buffers_stolen = 0
> [35446.681402] cx18-2: queue_log: &q_free = ffff81021d7901f8  &q_full
> = ffff81021d790218  &q_io = ffff81021d790238
> [35446.681402] cx18-2: queue_log: q = ffff81021d790218   buffers = 1
> length = 65536  bytesused = 43008
> [35446.681402] cx18-2: queue_log: stream 'encoder MPEG'  buffers = 63
> buf_size = 65536  buffers_stolen = 0
> [35446.681402] cx18-2: queue_log: &q_free = ffff81021d7901f8  &q_full
> = ffff81021d790218  &q_io = ffff81021d790238
> [35446.681402] cx18-2: queue_log: q = ffff81021d7901f8   buffers = 62
> length = 3997696  bytesused = 0
> [35446.681402] cx18-2: queue_log: i = 0  buf->id = 39  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 1  buf->id = 40  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 2  buf->id = 41  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 3  buf->id = 42  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 4  buf->id = 43  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 5  buf->id = 44  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 6  buf->id = 45  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 7  buf->id = 46  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 8  buf->id = 47  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 9  buf->id = 48  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 10  buf->id = 49  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 11  buf->id = 50  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 12  buf->id = 51  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 13  buf->id = 52  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 14  buf->id = 53  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 15  buf->id = 54  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 16  buf->id = 55  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 17  buf->id = 56  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 18  buf->id = 57  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 19  buf->id = 58  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 20  buf->id = 59  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 21  buf->id = 60  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 22  buf->id = 61  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 23  buf->id = 62  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 24  buf->id = 0  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 25  buf->id = 1  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 26  buf->id = 2  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 27  buf->id = 3  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 28  buf->id = 4  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 29  buf->id = 5  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 30  buf->id = 6  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 31  buf->id = 7  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 32  buf->id = 8  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 33  buf->id = 9  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 34  buf->id = 10  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 35  buf->id = 11  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 36  buf->id = 12  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 37  buf->id = 13  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 38  buf->id = 14  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 39  buf->id = 15  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 40  buf->id = 16  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 41  buf->id = 17  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 42  buf->id = 18  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 43  buf->id = 19  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 44  buf->id = 20  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 45  buf->id = 21  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 46  buf->id = 22  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 47  buf->id = 23  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 48  buf->id = 24  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 49  buf->id = 25  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 50  buf->id = 26  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 51  buf->id = 27  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 52  buf->id = 28  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 53  buf->id = 29  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 54  buf->id = 30  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 55  buf->id = 32  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 56  buf->id = 33  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 57  buf->id = 34  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 58  buf->id = 35  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 59  buf->id = 36  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 60  buf->id = 37  buf->bytesused
> = 0  buf->readpos = 0
> [35446.681402] cx18-2: queue_log: i = 61  buf->id = 38  buf->bytesused
> = 0  buf->readpos = 0
> 
> --
> video4linux-list mailing list
> Unsubscribe mailto:video4linux-list-request@xxxxxxxxxx?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/video4linux-list
> 


_______________________________________________
linux-dvb mailing list
linux-dvb@xxxxxxxxxxx
http://www.linuxtv.org/cgi-bin/mailman/listinfo/linux-dvb

[Index of Archives]     [Linux Media]     [Video 4 Linux]     [Asterisk]     [Samba]     [Xorg]     [Xfree86]     [Linux USB]

  Powered by Linux