Re: Bug in V4L2 driver by AMD?

"Peter Teoh" <htmldeveloper@xxxxxxxxx> · Mon, 19 May 2008 06:08:31 +0800

u are right, since v4l_qbfr() is holding on to the iolock, then
anything instructions inside this block should not reuse the lock
again.   it will deadlock. unless the iolock is a different
variable....but from above..passed by dp, it should be the same
variable.

but I was wondering why preempt_schedule_irq() CAN DETECT deadlock...

Call Trace:
[<c034462d>] preempt_schedule_irq+0x3b/0x53
[<d08d9f39>] v4l_capt_unstall+0x15/0x81 [lxv4l2]
[<d08da00b>] v4l_qbfr+0x66/0x75 [lxv4l2]
[<d08daad7>] v4l_stream_copy+0x279/0x2b4 [lxv4l2]

the reason is because the same lock is logically possible to be
detected locked concurrently in two different parts of the kernel at
the same time, essentially one will lock, and one will continue
execution.   but if the lock is already held by the same CPU, and
attempted reuse of the same CPU will result in deadlock, if IRQ is
disabled.   And execution should be on the same CPU, right? (since
there is no rescheduling done via schedule() or the like)   Correct?
In our case it is, so how can the kernel detect this as verified by
the debug output????? Please share....

On Sun, May 18, 2008 at 11:29 PM, Oliver Urbann <ciVic@xxxxxx> wrote:
> Hi,
>
> i am using an AMD Geode with a cam (with v4l driver). In some cases the
> kernel shows the following when i am reading from /dev/video0:
>
> ------------[ cut here ]------------
> kernel BUG at kernel/rtmutex.c:682!
> invalid opcode: 0000 [#1]
> PREEMPT
> Modules linked in: lxv4l2 i2c_serial zd1211rw
> CPU:    0
> EIP:    0060:[<c0344c23>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.22.9-aldebaran-rt #1)
> EIP is at rt_spin_lock_slowlock+0x5e/0x176
> eax: ceb92bf0   ebx: 00000282   ecx: c13d8414   edx: ceb92bf0
> esi: c13d8404   edi: c13d8404   ebp: ce06001c   esp: ca467e80
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068  preempt:00000002
> Process naoqi (pid: 2992, ti=ca466000 task=ceb92bf0 task.ti=ca466000)
> Stack: 00000086 307f4268 00000026 00000046 c0403998 ceb92bf0 307f4268
> 00000026
>      ffffffff ca466000 ceb92bf0 ca467ec0 c034462d 00000000 d03d547c ce06001c
>      c13d8400 c13d8404 ce06001c d08d9f39 d0342000 d03d547c 086261bc d0342000
> Call Trace:
> [<c034462d>] preempt_schedule_irq+0x3b/0x53
> [<d08d9f39>] v4l_capt_unstall+0x15/0x81 [lxv4l2]
> [<d08da00b>] v4l_qbfr+0x66/0x75 [lxv4l2]
> [<d08daad7>] v4l_stream_copy+0x279/0x2b4 [lxv4l2]
> [<d08dc7bd>] vid_read+0x130/0x13c [lxv4l2]
> [<d08dc68d>] vid_read+0x0/0x13c [lxv4l2]
> [<c01588f2>] vfs_read+0x88/0x110
> [<c0158c33>] sys_read+0x41/0x67
> [<c0103d1a>] syscall_call+0x7/0xb
> =======================
> Code: f0 e8 8a 08 df ff 85 c0 74 11 53 9d 89 e0 25 00 e0 ff ff ff 48 14 e9
> 14 01 00 00 8b 15 00 f0 3f c0 8b 46 10 83 e0 fc 39 d0 75 04 <0f> 0b eb fe b8
> 04 00 00 00 87 02 89 44 24 04 89 e5 81 e5 00 e0
> EIP: [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176 SS:ESP 0068:ca467e80
> note: naoqi[2992] exited with preempt_count 1
> BUG: scheduling while atomic: naoqi/0x00000002/2992, CPU#0
> [<c0343ec0>] __schedule+0x85/0x383
> [<c012ef0b>] hrtimer_interrupt+0x16e/0x196
> [<c034442a>] schedule+0xe5/0xfb
> [<c0344f37>] rt_mutex_slowlock+0x193/0x237
> [<c0344bc2>] rt_mutex_lock+0x2b/0x2e
> [<c01342dd>] futex_wake+0x22/0xc2
> [<c0130000>] timekeeping_resume+0x30/0xc2
> [<c01343fe>] do_futex+0x81/0x9d3
> [<c02563ab>] serial8250_console_putchar+0x33/0x76
> [<c0251c9e>] uart_console_write+0x29/0x33
> [<c0256378>] serial8250_console_putchar+0x0/0x76
> [<c011da98>] release_console_sem+0x180/0x1b0
> [<c0134e35>] sys_futex+0xe5/0xf7
> [<c010609a>] do_IRQ+0x67/0x7d
> [<c011b96e>] mm_release+0x81/0x87
> [<c011ea46>] exit_mm+0x12/0xd3
> [<c011fec1>] do_exit+0x1ba/0x722
> [<c01050a5>] die+0x1d3/0x1db
> [<c010542a>] do_invalid_op+0x0/0x8a
> [<c01054ab>] do_invalid_op+0x81/0x8a
> [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176
> [<c0108ee7>] pit_next_event+0x2f/0x34
> [<c0131f96>] clockevents_program_event+0xb5/0xbc
> [<c0132cf6>] tick_program_event+0x2a/0x49
> [<c0119251>] update_curr+0x235/0x256
> [<c0345612>] error_code+0x6a/0x70
> [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176
> [<c034462d>] preempt_schedule_irq+0x3b/0x53
> [<d08d9f39>] v4l_capt_unstall+0x15/0x81 [lxv4l2]
> [<d08da00b>] v4l_qbfr+0x66/0x75 [lxv4l2]
> [<d08daad7>] v4l_stream_copy+0x279/0x2b4 [lxv4l2]
> [<d08dc7bd>] vid_read+0x130/0x13c [lxv4l2]
> [<d08dc68d>] vid_read+0x0/0x13c [lxv4l2]
> [<c01588f2>] vfs_read+0x88/0x110
> [<c0158c33>] sys_read+0x41/0x67
> [<c0103d1a>] syscall_call+0x7/0xb
> =======================
> BUG: scheduling while atomic: naoqi/0x00000002/2992, CPU#0
> [<c0343ec0>] __schedule+0x85/0x383
> [<c034442a>] schedule+0xe5/0xfb
> [<c0344f37>] rt_mutex_slowlock+0x193/0x237
> [<c0344bc2>] rt_mutex_lock+0x2b/0x2e
> [<c011ea58>] exit_mm+0x24/0xd3
> [<c011fec1>] do_exit+0x1ba/0x722
> [<c01050a5>] die+0x1d3/0x1db
> [<c010542a>] do_invalid_op+0x0/0x8a
> [<c01054ab>] do_invalid_op+0x81/0x8a
> [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176
> [<c0108ee7>] pit_next_event+0x2f/0x34
> [<c0131f96>] clockevents_program_event+0xb5/0xbc
> [<c0132cf6>] tick_program_event+0x2a/0x49
> [<c0119251>] update_curr+0x235/0x256
> [<c0345612>] error_code+0x6a/0x70
> [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176
> [<c034462d>] preempt_schedule_irq+0x3b/0x53
> [<d08d9f39>] v4l_capt_unstall+0x15/0x81 [lxv4l2]
> [<d08da00b>] v4l_qbfr+0x66/0x75 [lxv4l2]
> [<d08daad7>] v4l_stream_copy+0x279/0x2b4 [lxv4l2]
> [<d08dc7bd>] vid_read+0x130/0x13c [lxv4l2]
> [<d08dc68d>] vid_read+0x0/0x13c [lxv4l2]
> [<c01588f2>] vfs_read+0x88/0x110
> [<c0158c33>] sys_read+0x41/0x67
> [<c0103d1a>] syscall_call+0x7/0xb
> =======================
>
>
> On the call stack i found the function v4l_capt_unstall() just before
> preempt_schedule_irq and so i took a look into the lxv4l2 driver by AMD. I
> found the following function, which calls v4l_capt_unstall(), in v4l.c:
>
> int v4l_qbfr(VidDevice *dp,struct list_head *lp,io_buf *bp,int capt)
> {
>  unsigned long flags;
>  FilePriv *fp = &dp->fp;
>  io_queue *io = fp->io;
>  spin_lock_irqsave(&io->lock, flags);
>  list_move_tail(&bp->bfrq,lp);
>  bp->sequence = io->sequence++;
>  bp->flags &= ~V4L2_BUF_FLAG_DONE;
>  bp->flags |= V4L2_BUF_FLAG_QUEUED;
>  if( capt != 0 && dp->capt_stalled != 0 )
>     v4l_capt_unstall(dp);
>  spin_unlock_irqrestore(&io->lock, flags);
> }
>
>
> And here is v4l_capt_unstall():
>
> int v4l_capt_unstall(VidDevice *dp)
> {
>  unsigned long flags;
>  int ret = 0;
>  FilePriv *fp = &dp->fp;
>  io_queue *io = fp->io;
>  spin_lock_irqsave(&io->lock, flags);
>  if( dp->capt_state == capt_state_run ) {
>     if( io != NULL ) {
>        if( dp->capt_stalled != 0 )
>           DMSG(4,"capt resumed\n");
>        ret = lx_capt_resume(dp,io);
>     }
>  }
>  spin_unlock_irqrestore(&io->lock, flags);
>  return ret;
> }
>
> I was wondering about the spin lock. The function v4l_capt_unstall() already
> locks io->lock, calls v4l_capt_unstall() and there again it tries to lock
> io->lock. I think this should end in a deadlock?
> If it would not end in a deadlock i think the crash is caused by the unlock
> in v4l_capt_unstall(), because the unlock calls a schedule, but there is
> still a lock (the lock from v4l_qbfr()).
>
> Can someone explain me what happened?
>
> Oliver Urbann
>
>
> --
> To unsubscribe from this list: send an email with
> "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
> Please read the FAQ at http://kernelnewbies.org/FAQ
>
>

-- 
Regards,
Peter Teoh

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ