Thank you Alan for your response, we will try reproducing this issue, this is a very rare case and we have just 2 such occurrences till now. > On Thu, 26 Jan 2012, Mansoor, Illyas wrote: > > > > > I have question related to suspend flow and completion signal: > > > > > > > > Is it possible that a completion signal could be missed in S3 flow > > > > after processes are frozen? > > > > > > What do you mean by "completion signal"? > > > > In our case we have an interrupt that signals completion of the > > interrupt using > > complete() > > For a waiting thread. > > It sounds like you're saying you have an interrupt handler that calls complete(), > and the target thread is stuck in wait_for_completion(). Yes, that is correct, and the thread that is waiting for completion is the suspend thread which is about to enter suspend, and hence the CPU is still not frozen and can receive Completion. > > > Our interrupt is configured with IRQ_NO_SUSPEND so we expect > > interrupts during Suspend also. > > IRQ_NO_SUSPEND no longer exists. Regardless, you can't receive interrupts > while the system is in suspend, because by definition the CPU isn't running at > that time. IRQF_NO_SUSPEND is the exact flag that is used, and the wait_for_completion happens In suspend thread dpm_suspend_start->...>pci_set_power_state->wait_for_completion > > > So here is what I'm seeing in the panic logs: > > > > <4>[ 7960.661939] Call Trace: > > <4>[ 7960.661953] [<c12384a5>] ? sub_preempt_count+0x85/0xc0 <4>[ > > 7960.661965] [<c12735b5>] refrigerator+0xa5/0x160 <4>[ 7960.661977] > > [<c125f64d>] get_signal_to_deliver+0x9ad/0xdc0 <4>[ 7960.661991] > > [<c120257b>] do_signal+0x6b/0xa20 <4>[ 7960.662003] [<c186ffbd>] ? > > schedule_hrtimeout_range+0x1cd/0x220 > > <4>[ 7960.662018] [<c122d4a4>] ? pmu_sc_irq+0x364/0x3d0 > > <==============This is where in the interrupt we signal completion > > using complete() > > That is irrelevant. The '?' means this is merely a random value on the stack, not > the return address of a function call. Okay, I see. 0xc122d4a4 is the exact instruction in pmu_sc_irq where it calls complete() So does this mean at least this (complete() ) happened some time during panic? > > > <4>[ 7960.662030] [<c18715f3>] ? > > _raw_spin_unlock_irqrestore+0x23/0x50 > > <4>[ 7960.662043] [<c12384a5>] ? sub_preempt_count+0x85/0xc0 <4>[ > > 7960.662056] [<c134c162>] ? sys_epoll_wait+0x72/0x300 <4>[ > > 7960.662069] [<c1242c40>] ? default_wake_function+0x0/0x20 <4>[ > > 7960.662082] [<c1202f85>] do_notify_resume+0x55/0x90 <4>[ > > 7960.662094] [<c1871e99>] work_notifysig+0x9/0x1 > > This stack trace appears to show that some thread attempted to send a signal and > got frozen while waiting. pm_sc_irq is the interrupt handler that calls complete(), and this happens to be listed in each of the threads call stack at least 8 such occurrences are there. > > > <6>[ 7960.380768] suspend R running 0 19 2 0x00000000 > > <4>[ 7960.380780] f78e3c94 00000000 c1204760 f78e3c8c c12047eb > > f78e3c94 > > 00000000 f78e3c04 > > <4>[ 7960.380800] 00000004 00000000 00030002 c1852942 00000001 > > 00000001 > > 00000282 f78e3c5c > > <4>[ 7960.380820] f78e3c30 c12384a5 00000282 f78e3c3c c18715f3 > > f7868000 > > f78e3c84 c186eebc > > <4>[ 7960.380840] Call Trace: > > <4>[ 7960.380850] [<c1204760>] ? do_invalid_op+0x0/0xb0 <4>[ > > 7960.380862] [<c12047eb>] ? do_invalid_op+0x8b/0xb0 <4>[ 7960.380875] > > [<c1852942>] ? pmu_pci_set_power_state+0x322/0x6e0 <==== here is where > > wait_for_completion_timeout call BUG() after timeout. > > <4>[ 7960.380888] [<c12384a5>] ? sub_preempt_count+0x85/0xc0 <4>[ > > 7960.380901] [<c18715f3>] ? _raw_spin_unlock_irqrestore+0x23/0x50 > > <4>[ 7960.380913] [<c186eebc>] ? schedule_timeout+0x1dc/0x430 <4>[ > > 7960.380926] [<c1204760>] ? do_invalid_op+0x0/0xb0 <4>[ 7960.380937] > > [<c14b916c>] ? trace_hardirqs_off_thunk+0xc/0x10 <4>[ 7960.380950] > > [<c187244b>] ? error_code+0x6b/0x70 <4>[ 7960.380961] [<c186dd96>] ? > > wait_for_common+0x96/0x120 <4>[ 7960.380973] [<c1204760>] ? > > do_invalid_op+0x0/0xb0 <4>[ 7960.380985] [<c1852942>] ? > > pmu_pci_set_power_state+0x322/0x6e0 > > <4>[ 7960.380998] [<c14b502a>] ? put_dec+0x2a/0xa0 <4>[ 7960.381011] > > [<c14b502a>] ? put_dec+0x2a/0xa0 <4>[ 7960.381025] [<c12384a5>] ? > > sub_preempt_count+0x85/0xc0 <4>[ 7960.381038] [<c14cee6e>] ? > > pci_platform_power_transition+0x3e/0xa0 > > <4>[ 7960.381051] [<c18715f3>] ? > > _raw_spin_unlock_irqrestore+0x23/0x50 > > <4>[ 7960.381064] [<c14cf5df>] ? pci_set_power_state+0x3f/0x2c0 <4>[ > > 7960.381077] [<c14ced7c>] ? pci_update_current_state+0x3c/0x50 > > <4>[ 7960.381090] [<c14d180e>] ? pci_pm_runtime_resume+0x5e/0xa0 <4>[ > > 7960.381102] [<c12384a5>] ? sub_preempt_count+0x85/0xc0 <4>[ > > 7960.381114] [<c14d17b0>] ? pci_pm_runtime_resume+0x0/0xa0 <4>[ > > 7960.381126] [<c1554d7b>] ? rpm_callback+0x3b/0x70 <4>[ 7960.381137] > > [<c155577c>] ? rpm_resume+0x37c/0x5c0 <4>[ 7960.381150] [<c124923b>] > > ? release_console_sem+0x37b/0x3c0 <4>[ 7960.381164] [<c1238593>] ? > > add_preempt_count+0xb3/0xf0 <4>[ 7960.381176] [<c1556609>] ? > > __pm_runtime_resume+0x49/0xc0 <4>[ 7960.381189] [<c14d1b71>] ? > > pci_pm_prepare+0x21/0x60 <4>[ 7960.381200] [<c1553947>] ? > > dpm_suspend_start+0x137/0x7d0 <4>[ 7960.381213] [<c12384a5>] ? > > sub_preempt_count+0x85/0xc0 <4>[ 7960.381225] [<c18715f3>] ? > > _raw_spin_unlock_irqrestore+0x23/0x50 > > <4>[ 7960.381237] [<c126f548>] ? up+0x28/0x40 <4>[ 7960.381250] > > [<c1288f13>] ? suspend_devices_and_enter+0x73/0x1d0 > > <4>[ 7960.381262] [<c1289196>] ? enter_state+0x126/0x1e0 <4>[ > > 7960.381273] [<c1289277>] ? pm_suspend+0x27/0x70 <4>[ 7960.381285] > > [<c128acba>] ? suspend+0x8a/0x160 <4>[ 7960.381296] [<c186e445>] ? > > schedule+0x545/0x9e0 <4>[ 7960.381310] [<c12384a5>] ? > > sub_preempt_count+0x85/0xc0 <4>[ 7960.381322] [<c1265833>] ? > > worker_thread+0x123/0x2c0 <4>[ 7960.381333] [<c186e445>] ? > > schedule+0x545/0x9e0 <4>[ 7960.381346] [<c128ac30>] ? > > suspend+0x0/0x160 <4>[ 7960.381357] [<c12690d0>] ? > > autoremove_wake_function+0x0/0x50 <4>[ 7960.381369] [<c1265710>] ? > > worker_thread+0x0/0x2c0 <4>[ 7960.381381] [<c1268c34>] ? > > kthread+0x74/0x80 <4>[ 7960.381393] [<c1268bc0>] ? kthread+0x0/0x80 > > <4>[ 7960.381405] [<c120357a>] ? kernel_thread_helper+0x6/0x1 > > This is very difficult to understand. You should add some printk statements to > your code, so that you will know what is going on. > > > So my question, is it possible that the complete() called in interrupt > > context can be missed During S3? > > I don't know what you mean by "missed". The complete() call will work, but the > target thread might not return from wait_for_completion() until after the system > returns to S0. Since we know that the interrupt handler has called complete() and we have timed out Of wait_for_completion_timeout in the suspend flow before suspend, I assumed completion signal is missed because of ongoing S3, so either the interrupt just came in while the wait_for_completioin timed out, so we BUG() out? -Illyas
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ linux-pm mailing list linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/linux-pm