u are right, since v4l_qbfr() is holding on to the iolock, then anything instructions inside this block should not reuse the lock again. it will deadlock. unless the iolock is a different variable....but from above..passed by dp, it should be the same variable. but I was wondering why preempt_schedule_irq() CAN DETECT deadlock... Call Trace: [<c034462d>] preempt_schedule_irq+0x3b/0x53 [<d08d9f39>] v4l_capt_unstall+0x15/0x81 [lxv4l2] [<d08da00b>] v4l_qbfr+0x66/0x75 [lxv4l2] [<d08daad7>] v4l_stream_copy+0x279/0x2b4 [lxv4l2] the reason is because the same lock is logically possible to be detected locked concurrently in two different parts of the kernel at the same time, essentially one will lock, and one will continue execution. but if the lock is already held by the same CPU, and attempted reuse of the same CPU will result in deadlock, if IRQ is disabled. And execution should be on the same CPU, right? (since there is no rescheduling done via schedule() or the like) Correct? In our case it is, so how can the kernel detect this as verified by the debug output????? Please share.... On Sun, May 18, 2008 at 11:29 PM, Oliver Urbann <ciVic@xxxxxx> wrote: > Hi, > > i am using an AMD Geode with a cam (with v4l driver). In some cases the > kernel shows the following when i am reading from /dev/video0: > > ------------[ cut here ]------------ > kernel BUG at kernel/rtmutex.c:682! > invalid opcode: 0000 [#1] > PREEMPT > Modules linked in: lxv4l2 i2c_serial zd1211rw > CPU: 0 > EIP: 0060:[<c0344c23>] Not tainted VLI > EFLAGS: 00010046 (2.6.22.9-aldebaran-rt #1) > EIP is at rt_spin_lock_slowlock+0x5e/0x176 > eax: ceb92bf0 ebx: 00000282 ecx: c13d8414 edx: ceb92bf0 > esi: c13d8404 edi: c13d8404 ebp: ce06001c esp: ca467e80 > ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068 preempt:00000002 > Process naoqi (pid: 2992, ti=ca466000 task=ceb92bf0 task.ti=ca466000) > Stack: 00000086 307f4268 00000026 00000046 c0403998 ceb92bf0 307f4268 > 00000026 > ffffffff ca466000 ceb92bf0 ca467ec0 c034462d 00000000 d03d547c ce06001c > c13d8400 c13d8404 ce06001c d08d9f39 d0342000 d03d547c 086261bc d0342000 > Call Trace: > [<c034462d>] preempt_schedule_irq+0x3b/0x53 > [<d08d9f39>] v4l_capt_unstall+0x15/0x81 [lxv4l2] > [<d08da00b>] v4l_qbfr+0x66/0x75 [lxv4l2] > [<d08daad7>] v4l_stream_copy+0x279/0x2b4 [lxv4l2] > [<d08dc7bd>] vid_read+0x130/0x13c [lxv4l2] > [<d08dc68d>] vid_read+0x0/0x13c [lxv4l2] > [<c01588f2>] vfs_read+0x88/0x110 > [<c0158c33>] sys_read+0x41/0x67 > [<c0103d1a>] syscall_call+0x7/0xb > ======================= > Code: f0 e8 8a 08 df ff 85 c0 74 11 53 9d 89 e0 25 00 e0 ff ff ff 48 14 e9 > 14 01 00 00 8b 15 00 f0 3f c0 8b 46 10 83 e0 fc 39 d0 75 04 <0f> 0b eb fe b8 > 04 00 00 00 87 02 89 44 24 04 89 e5 81 e5 00 e0 > EIP: [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176 SS:ESP 0068:ca467e80 > note: naoqi[2992] exited with preempt_count 1 > BUG: scheduling while atomic: naoqi/0x00000002/2992, CPU#0 > [<c0343ec0>] __schedule+0x85/0x383 > [<c012ef0b>] hrtimer_interrupt+0x16e/0x196 > [<c034442a>] schedule+0xe5/0xfb > [<c0344f37>] rt_mutex_slowlock+0x193/0x237 > [<c0344bc2>] rt_mutex_lock+0x2b/0x2e > [<c01342dd>] futex_wake+0x22/0xc2 > [<c0130000>] timekeeping_resume+0x30/0xc2 > [<c01343fe>] do_futex+0x81/0x9d3 > [<c02563ab>] serial8250_console_putchar+0x33/0x76 > [<c0251c9e>] uart_console_write+0x29/0x33 > [<c0256378>] serial8250_console_putchar+0x0/0x76 > [<c011da98>] release_console_sem+0x180/0x1b0 > [<c0134e35>] sys_futex+0xe5/0xf7 > [<c010609a>] do_IRQ+0x67/0x7d > [<c011b96e>] mm_release+0x81/0x87 > [<c011ea46>] exit_mm+0x12/0xd3 > [<c011fec1>] do_exit+0x1ba/0x722 > [<c01050a5>] die+0x1d3/0x1db > [<c010542a>] do_invalid_op+0x0/0x8a > [<c01054ab>] do_invalid_op+0x81/0x8a > [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176 > [<c0108ee7>] pit_next_event+0x2f/0x34 > [<c0131f96>] clockevents_program_event+0xb5/0xbc > [<c0132cf6>] tick_program_event+0x2a/0x49 > [<c0119251>] update_curr+0x235/0x256 > [<c0345612>] error_code+0x6a/0x70 > [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176 > [<c034462d>] preempt_schedule_irq+0x3b/0x53 > [<d08d9f39>] v4l_capt_unstall+0x15/0x81 [lxv4l2] > [<d08da00b>] v4l_qbfr+0x66/0x75 [lxv4l2] > [<d08daad7>] v4l_stream_copy+0x279/0x2b4 [lxv4l2] > [<d08dc7bd>] vid_read+0x130/0x13c [lxv4l2] > [<d08dc68d>] vid_read+0x0/0x13c [lxv4l2] > [<c01588f2>] vfs_read+0x88/0x110 > [<c0158c33>] sys_read+0x41/0x67 > [<c0103d1a>] syscall_call+0x7/0xb > ======================= > BUG: scheduling while atomic: naoqi/0x00000002/2992, CPU#0 > [<c0343ec0>] __schedule+0x85/0x383 > [<c034442a>] schedule+0xe5/0xfb > [<c0344f37>] rt_mutex_slowlock+0x193/0x237 > [<c0344bc2>] rt_mutex_lock+0x2b/0x2e > [<c011ea58>] exit_mm+0x24/0xd3 > [<c011fec1>] do_exit+0x1ba/0x722 > [<c01050a5>] die+0x1d3/0x1db > [<c010542a>] do_invalid_op+0x0/0x8a > [<c01054ab>] do_invalid_op+0x81/0x8a > [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176 > [<c0108ee7>] pit_next_event+0x2f/0x34 > [<c0131f96>] clockevents_program_event+0xb5/0xbc > [<c0132cf6>] tick_program_event+0x2a/0x49 > [<c0119251>] update_curr+0x235/0x256 > [<c0345612>] error_code+0x6a/0x70 > [<c0344c23>] rt_spin_lock_slowlock+0x5e/0x176 > [<c034462d>] preempt_schedule_irq+0x3b/0x53 > [<d08d9f39>] v4l_capt_unstall+0x15/0x81 [lxv4l2] > [<d08da00b>] v4l_qbfr+0x66/0x75 [lxv4l2] > [<d08daad7>] v4l_stream_copy+0x279/0x2b4 [lxv4l2] > [<d08dc7bd>] vid_read+0x130/0x13c [lxv4l2] > [<d08dc68d>] vid_read+0x0/0x13c [lxv4l2] > [<c01588f2>] vfs_read+0x88/0x110 > [<c0158c33>] sys_read+0x41/0x67 > [<c0103d1a>] syscall_call+0x7/0xb > ======================= > > > On the call stack i found the function v4l_capt_unstall() just before > preempt_schedule_irq and so i took a look into the lxv4l2 driver by AMD. I > found the following function, which calls v4l_capt_unstall(), in v4l.c: > > int v4l_qbfr(VidDevice *dp,struct list_head *lp,io_buf *bp,int capt) > { > unsigned long flags; > FilePriv *fp = &dp->fp; > io_queue *io = fp->io; > spin_lock_irqsave(&io->lock, flags); > list_move_tail(&bp->bfrq,lp); > bp->sequence = io->sequence++; > bp->flags &= ~V4L2_BUF_FLAG_DONE; > bp->flags |= V4L2_BUF_FLAG_QUEUED; > if( capt != 0 && dp->capt_stalled != 0 ) > v4l_capt_unstall(dp); > spin_unlock_irqrestore(&io->lock, flags); > } > > > And here is v4l_capt_unstall(): > > int v4l_capt_unstall(VidDevice *dp) > { > unsigned long flags; > int ret = 0; > FilePriv *fp = &dp->fp; > io_queue *io = fp->io; > spin_lock_irqsave(&io->lock, flags); > if( dp->capt_state == capt_state_run ) { > if( io != NULL ) { > if( dp->capt_stalled != 0 ) > DMSG(4,"capt resumed\n"); > ret = lx_capt_resume(dp,io); > } > } > spin_unlock_irqrestore(&io->lock, flags); > return ret; > } > > I was wondering about the spin lock. The function v4l_capt_unstall() already > locks io->lock, calls v4l_capt_unstall() and there again it tries to lock > io->lock. I think this should end in a deadlock? > If it would not end in a deadlock i think the crash is caused by the unlock > in v4l_capt_unstall(), because the unlock calls a schedule, but there is > still a lock (the lock from v4l_qbfr()). > > Can someone explain me what happened? > > Oliver Urbann > > > -- > To unsubscribe from this list: send an email with > "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx > Please read the FAQ at http://kernelnewbies.org/FAQ > > -- Regards, Peter Teoh -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ