I hit a new backtrace today, hopefully it adds something. # cat /proc/19659/stack [<ffffffff815304d1>] iscsit_stop_session+0x1b1/0x1c0 [<ffffffff81521c62>] iscsi_check_for_session_reinstatement+0x1e2/0x270 [<ffffffff81524660>] iscsi_target_check_for_existing_instances+0x30/0x40 [<ffffffff815247a8>] iscsi_target_do_login+0x138/0x630 [<ffffffff815259be>] iscsi_target_start_negotiation+0x4e/0xa0 [<ffffffff8152355e>] __iscsi_target_login_thread+0x83e/0xf20 [<ffffffff81523c64>] iscsi_target_login_thread+0x24/0x30 [<ffffffff810a3059>] kthread+0xd9/0xf0 [<ffffffff817732d5>] ret_from_fork+0x25/0x30 [<ffffffffffffffff>] 0xffffffffffffffff # cat /proc/21342/stack [<ffffffffa0292b10>] __ib_drain_sq+0x190/0x1c0 [ib_core] [<ffffffffa0292b65>] ib_drain_sq+0x25/0x30 [ib_core] [<ffffffffa0292d72>] ib_drain_qp+0x12/0x30 [ib_core] [<ffffffffa062c5ff>] isert_wait_conn+0x5f/0x2d0 [ib_isert] [<ffffffff815309b7>] iscsit_close_connection+0x157/0x860 [<ffffffff8151f10b>] iscsit_take_action_for_connection_exit+0x7b/0xf0 [<ffffffff81530265>] iscsi_target_rx_thread+0x95/0xa0 [<ffffffff810a3059>] kthread+0xd9/0xf0 [<ffffffff817732d5>] ret_from_fork+0x25/0x30 [<ffffffffffffffff>] 0xffffffffffffffff # ps aux | grep iscsi | grep D root 19659 0.0 0.0 0 0 ? D 16:12 0:00 [iscsi_np] root 21342 0.0 0.0 0 0 ? D 16:29 0:00 [iscsi_trx] ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Dec 15, 2016 at 1:38 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > Nicholas, > > I've found that the kernels I used were not able to be inspected using > crash and I could not build the debug info for them. So I built a new > 4.9 kernel and verified that I could inspect the crash. It is located > at [1]. > > [1] http://mirrors.betterservers.com/trace/crash2.tar.xz > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Mon, Dec 12, 2016 at 4:57 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: >> Nicholas, >> >> After lots of set backs and having to give up trying to get kernel >> dumps on our "production" systems, I've been able to work out the >> issues we had with kdump and replicate the issue on my dev boxes. I >> have dumps from 4.4.30 and 4.9-rc8 (makedumpfile would not dump, so it >> is a straight copy of /proc/vmcore from the crash kernel). In each >> crash directory, I put a details.txt file that has the process IDs >> that were having problems and a brief description of the set-up at the >> time. This was mostly replicated by starting fio and pulling the >> Infiniband cable until fio gave up. This hardware also has Mellanox >> ConnectX4-LX cards and I also replicated the issue over RoCE using 4.9 >> since it has the drivers in-box. Please let me know if you need more >> info, I can test much faster now. The cores/kernels/modules are >> located at [1]. >> >> [1] http://mirrors.betterservers.com/trace/crash.tar.xz >> >> Thanks, >> Robert >> ---------------- >> Robert LeBlanc >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> >> >> On Fri, Nov 4, 2016 at 3:57 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: >>> We hit this yesterday, this time it was on the tx thread (the other >>> ones before seem to be on the rx thread). We weren't able to get a >>> kernel dump on this. We'll try to get one next time. >>> >>> # ps axuw | grep "D.*iscs[i]" >>> root 12383 0.0 0.0 0 0 ? D Nov03 0:04 [iscsi_np] >>> root 23016 0.0 0.0 0 0 ? D Nov03 0:00 [iscsi_ttx] >>> root 23018 0.0 0.0 0 0 ? D Nov03 0:00 [iscsi_ttx] >>> # cat /proc/12383/stack >>> [<ffffffff814f24af>] iscsit_stop_session+0x19f/0x1d0 >>> [<ffffffff814e3c66>] iscsi_check_for_session_reinstatement+0x1e6/0x270 >>> [<ffffffff814e6620>] iscsi_target_check_for_existing_instances+0x30/0x40 >>> [<ffffffff814e6770>] iscsi_target_do_login+0x140/0x640 >>> [<ffffffff814e7b0c>] iscsi_target_start_negotiation+0x1c/0xb0 >>> [<ffffffff814e585b>] iscsi_target_login_thread+0xa9b/0xfc0 >>> [<ffffffff8109d7c8>] kthread+0xd8/0xf0 >>> [<ffffffff81721a8f>] ret_from_fork+0x3f/0x70 >>> [<ffffffffffffffff>] 0xffffffffffffffff >>> # cat /proc/23016/stack >>> [<ffffffff814ce0d9>] target_wait_for_sess_cmds+0x49/0x1a0 >>> [<ffffffffa058b92b>] isert_wait_conn+0x1ab/0x2f0 [ib_isert] >>> [<ffffffff814f2642>] iscsit_close_connection+0x162/0x870 >>> [<ffffffff814e110f>] iscsit_take_action_for_connection_exit+0x7f/0x100 >>> [<ffffffff814f122a>] iscsi_target_tx_thread+0x1aa/0x1d0 >>> [<ffffffff8109d7c8>] kthread+0xd8/0xf0 >>> [<ffffffff81721a8f>] ret_from_fork+0x3f/0x70 >>> [<ffffffffffffffff>] 0xffffffffffffffff >>> # cat /proc/23018/stack >>> [<ffffffff814ce0d9>] target_wait_for_sess_cmds+0x49/0x1a0 >>> [<ffffffffa058b92b>] isert_wait_conn+0x1ab/0x2f0 [ib_isert] >>> [<ffffffff814f2642>] iscsit_close_connection+0x162/0x870 >>> [<ffffffff814e110f>] iscsit_take_action_for_connection_exit+0x7f/0x100 >>> [<ffffffff814f122a>] iscsi_target_tx_thread+0x1aa/0x1d0 >>> [<ffffffff8109d7c8>] kthread+0xd8/0xf0 >>> [<ffffffff81721a8f>] ret_from_fork+0x3f/0x70 >>> [<ffffffffffffffff>] 0xffffffffffffffff >>> >>> From dmesg: >>> [ 394.476332] INFO: rcu_sched self-detected stall on CPU >>> [ 394.476334] 20-...: (23976 ticks this GP) >>> idle=edd/140000000000001/0 softirq=292/292 fqs=18788 >>> [ 394.476336] (t=24003 jiffies g=3146 c=3145 q=0) >>> [ 394.476337] Task dump for CPU 20: >>> [ 394.476338] kworker/u68:2 R running task 0 12906 2 0x00000008 >>> [ 394.476345] Workqueue: isert_comp_wq isert_cq_work [ib_isert] >>> [ 394.476346] ffff883f2fe38000 00000000f805705e ffff883f7fd03da8 >>> ffffffff810ac8ff >>> [ 394.476347] 0000000000000014 ffffffff81adb680 ffff883f7fd03dc0 >>> ffffffff810af239 >>> [ 394.476348] 0000000000000015 ffff883f7fd03df0 ffffffff810e1cd0 >>> ffff883f7fd17b80 >>> [ 394.476348] Call Trace: >>> [ 394.476354] <IRQ> [<ffffffff810ac8ff>] sched_show_task+0xaf/0x110 >>> [ 394.476355] [<ffffffff810af239>] dump_cpu_task+0x39/0x40 >>> [ 394.476357] [<ffffffff810e1cd0>] rcu_dump_cpu_stacks+0x80/0xb0 >>> [ 394.476359] [<ffffffff810e6100>] rcu_check_callbacks+0x540/0x820 >>> [ 394.476360] [<ffffffff810afe11>] ? account_system_time+0x81/0x110 >>> [ 394.476363] [<ffffffff810faa60>] ? tick_sched_do_timer+0x50/0x50 >>> [ 394.476364] [<ffffffff810eb599>] update_process_times+0x39/0x60 >>> [ 394.476365] [<ffffffff810fa815>] tick_sched_handle.isra.17+0x25/0x60 >>> [ 394.476366] [<ffffffff810faa9d>] tick_sched_timer+0x3d/0x70 >>> [ 394.476368] [<ffffffff810ec182>] __hrtimer_run_queues+0x102/0x290 >>> [ 394.476369] [<ffffffff810ec668>] hrtimer_interrupt+0xa8/0x1a0 >>> [ 394.476372] [<ffffffff81052c65>] local_apic_timer_interrupt+0x35/0x60 >>> [ 394.476374] [<ffffffff8172423d>] smp_apic_timer_interrupt+0x3d/0x50 >>> [ 394.476376] [<ffffffff817224f7>] apic_timer_interrupt+0x87/0x90 >>> [ 394.476379] <EOI> [<ffffffff810d71be>] ? console_unlock+0x41e/0x4e0 >>> [ 394.476380] [<ffffffff810d757c>] vprintk_emit+0x2fc/0x500 >>> [ 394.476382] [<ffffffff810d78ff>] vprintk_default+0x1f/0x30 >>> [ 394.476384] [<ffffffff81174dde>] printk+0x5d/0x74 >>> [ 394.476388] [<ffffffff814bce21>] transport_lookup_cmd_lun+0x1d1/0x200 >>> [ 394.476390] [<ffffffff814ee8c0>] iscsit_setup_scsi_cmd+0x230/0x540 >>> [ 394.476392] [<ffffffffa058dbf3>] isert_rx_do_work+0x3f3/0x7f0 [ib_isert] >>> [ 394.476394] [<ffffffffa058e174>] isert_cq_work+0x184/0x770 [ib_isert] >>> [ 394.476396] [<ffffffff8109740f>] process_one_work+0x14f/0x400 >>> [ 394.476397] [<ffffffff81097c84>] worker_thread+0x114/0x470 >>> [ 394.476398] [<ffffffff8171d32a>] ? __schedule+0x34a/0x7f0 >>> [ 394.476399] [<ffffffff81097b70>] ? rescuer_thread+0x310/0x310 >>> [ 394.476400] [<ffffffff8109d7c8>] kthread+0xd8/0xf0 >>> [ 394.476402] [<ffffffff8109d6f0>] ? kthread_park+0x60/0x60 >>> [ 394.476403] [<ffffffff81721a8f>] ret_from_fork+0x3f/0x70 >>> [ 394.476404] [<ffffffff8109d6f0>] ? kthread_park+0x60/0x60 >>> [ 405.716632] Unexpected ret: -104 send data 360 >>> [ 405.721711] tx_data returned -32, expecting 360. >>> ---------------- >>> Robert LeBlanc >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>> >>> >>> On Mon, Oct 31, 2016 at 10:34 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: >>>> Nicholas, >>>> >>>> Thanks for following up on this. We have been chasing other bugs in >>>> our provisioning and as such has reduced our load on the boxes. We are >>>> hoping to get that all straightened out this week and do some more >>>> testing. So far we have not had any iSCSI in D state since the patch, >>>> be we haven't been able to test it well either. We will keep you >>>> updated. >>>> >>>> Thank you, >>>> Robert LeBlanc >>>> ---------------- >>>> Robert LeBlanc >>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>>> >>>> >>>> On Sat, Oct 29, 2016 at 4:29 PM, Nicholas A. Bellinger >>>> <nab@xxxxxxxxxxxxxxx> wrote: >>>>> Hi Robert, >>>>> >>>>> On Wed, 2016-10-19 at 10:41 -0600, Robert LeBlanc wrote: >>>>>> Nicholas, >>>>>> >>>>>> I didn't have high hopes for the patch because we were not seeing >>>>>> TMR_ABORT_TASK (or 'abort') in dmesg or /var/log/messages, but it >>>>>> seemed to help regardless. Our clients finally OOMed from the hung >>>>>> sessions, so we are having to reboot them and we will do some more >>>>>> testing. We haven't put the updated kernel on our clients yet. Our >>>>>> clients have iSCSI root disks so I'm not sure if we can get a vmcore >>>>>> on those, but we will do what we can to get you a vmcore from the >>>>>> target if it happens again. >>>>>> >>>>> >>>>> Just checking in to see if you've observed further issues with >>>>> iser-target ports, and/or able to generate a crashdump with v4.4.y..? >>>>> >>>>>> As far as our configuration: It is a superMicro box with 6 SAMSUNG >>>>>> MZ7LM3T8HCJM-00005 SSDs. Two are for root and four are in mdadm >>>>>> RAID-10 for exporting via iSCSI/iSER. We have ZFS on top of the >>>>>> RAID-10 for checksum and snapshots only and we export ZVols to the >>>>>> clients (one or more per VM on the client). We do not persist the >>>>>> export info (targetcli saveconfig), but regenerate it from scripts. >>>>>> The client receives two or more of these exports and puts them in a >>>>>> RAID-1 device. The exports are served by iSER one one port and also by >>>>>> normal iSCSI on a different port for compatibility, but not normally >>>>>> used. If you need more info about the config, please let me know. It >>>>>> was kind of a vague request so I'm not sure what exactly is important >>>>>> to you. >>>>> >>>>> Thanks for the extra details of your hardware + user-space >>>>> configuration. >>>>> >>>>>> Thanks for helping us with this, >>>>>> Robert LeBlanc >>>>>> >>>>>> When we have problems, we usually see this in the logs: >>>>>> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login timeout on >>>>>> Network Portal 0.0.0.0:3260 >>>>>> Oct 17 08:57:50 prv-0-12-sanstack kernel: Unexpected ret: -104 send data 48 >>>>>> Oct 17 08:57:50 prv-0-12-sanstack kernel: tx_data returned -32, expecting 48. >>>>>> Oct 17 08:57:50 prv-0-12-sanstack kernel: iSCSI Login negotiation failed. >>>>>> >>>>>> I found some backtraces in the logs, not sure if this is helpful, this >>>>>> is before your patch (your patch booted at Oct 18 10:36:59): >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: INFO: rcu_sched >>>>>> self-detected stall on CPU >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: #0115-...: (41725 ticks this >>>>>> GP) idle=b59/140000000000001/0 softirq=535/535 fqs=30992 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: #011 (t=42006 jiffies g=1550 >>>>>> c=1549 q=0) >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Task dump for CPU 5: >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: kworker/u68:2 R running >>>>>> task 0 17967 2 0x00000008 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Workqueue: isert_comp_wq >>>>>> isert_cq_work [ib_isert] >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: ffff883f4c0dca80 >>>>>> 00000000af8ca7a4 ffff883f7fb43da8 ffffffff810ac83f >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000005 >>>>>> ffffffff81adb680 ffff883f7fb43dc0 ffffffff810af179 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: 0000000000000006 >>>>>> ffff883f7fb43df0 ffffffff810e1c10 ffff883f7fb57b80 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: Call Trace: >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: <IRQ> [<ffffffff810ac83f>] >>>>>> sched_show_task+0xaf/0x110 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810af179>] >>>>>> dump_cpu_task+0x39/0x40 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e1c10>] >>>>>> rcu_dump_cpu_stacks+0x80/0xb0 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810e6040>] >>>>>> rcu_check_callbacks+0x540/0x820 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810afd51>] ? >>>>>> account_system_time+0x81/0x110 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9a0>] ? >>>>>> tick_sched_do_timer+0x50/0x50 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810eb4d9>] >>>>>> update_process_times+0x39/0x60 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa755>] >>>>>> tick_sched_handle.isra.17+0x25/0x60 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810fa9dd>] >>>>>> tick_sched_timer+0x3d/0x70 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec0c2>] >>>>>> __hrtimer_run_queues+0x102/0x290 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810ec5a8>] >>>>>> hrtimer_interrupt+0xa8/0x1a0 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81052c65>] >>>>>> local_apic_timer_interrupt+0x35/0x60 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8172343d>] >>>>>> smp_apic_timer_interrupt+0x3d/0x50 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff817216f7>] >>>>>> apic_timer_interrupt+0x87/0x90 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: <EOI> [<ffffffff810d70fe>] >>>>>> ? console_unlock+0x41e/0x4e0 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d74bc>] >>>>>> vprintk_emit+0x2fc/0x500 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff810d783f>] >>>>>> vprintk_default+0x1f/0x30 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81174c2a>] printk+0x5d/0x74 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814bc351>] >>>>>> transport_lookup_cmd_lun+0x1d1/0x200 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff814edcf0>] >>>>>> iscsit_setup_scsi_cmd+0x230/0x540 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0890bf3>] >>>>>> isert_rx_do_work+0x3f3/0x7f0 [ib_isert] >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffffa0891174>] >>>>>> isert_cq_work+0x184/0x770 [ib_isert] >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109734f>] >>>>>> process_one_work+0x14f/0x400 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097bc4>] >>>>>> worker_thread+0x114/0x470 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8171c55a>] ? >>>>>> __schedule+0x34a/0x7f0 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81097ab0>] ? >>>>>> rescuer_thread+0x310/0x310 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d708>] kthread+0xd8/0xf0 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ? >>>>>> kthread_park+0x60/0x60 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff81720c8f>] >>>>>> ret_from_fork+0x3f/0x70 >>>>>> Oct 17 15:43:12 prv-0-12-sanstack kernel: [<ffffffff8109d630>] ? >>>>>> kthread_park+0x60/0x60 >>>>>> >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: INFO: rcu_sched >>>>>> self-detected stall on CPU >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: #01128-...: (5999 ticks this >>>>>> GP) idle=2f9/140000000000001/0 softirq=457/457 fqs=4830 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=3546 >>>>>> c=3545 q=0) >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: Task dump for CPU 28: >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: iscsi_np R running >>>>>> task 0 16597 2 0x0000000c >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: ffff887f40350000 >>>>>> 00000000b98a67bb ffff887f7f503da8 ffffffff810ac8ff >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001c >>>>>> ffffffff81adb680 ffff887f7f503dc0 ffffffff810af239 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: 000000000000001d >>>>>> ffff887f7f503df0 ffffffff810e1cd0 ffff887f7f517b80 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: Call Trace: >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: <IRQ> [<ffffffff810ac8ff>] >>>>>> sched_show_task+0xaf/0x110 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810af239>] >>>>>> dump_cpu_task+0x39/0x40 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>] >>>>>> rcu_dump_cpu_stacks+0x80/0xb0 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810e6100>] >>>>>> rcu_check_callbacks+0x540/0x820 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ? >>>>>> account_system_time+0x81/0x110 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ? >>>>>> tick_sched_do_timer+0x50/0x50 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810eb599>] >>>>>> update_process_times+0x39/0x60 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810fa815>] >>>>>> tick_sched_handle.isra.17+0x25/0x60 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810faa9d>] >>>>>> tick_sched_timer+0x3d/0x70 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec182>] >>>>>> __hrtimer_run_queues+0x102/0x290 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810ec668>] >>>>>> hrtimer_interrupt+0xa8/0x1a0 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81052c65>] >>>>>> local_apic_timer_interrupt+0x35/0x60 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81723cbd>] >>>>>> smp_apic_timer_interrupt+0x3d/0x50 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81721f77>] >>>>>> apic_timer_interrupt+0x87/0x90 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: <EOI> [<ffffffff810d71be>] >>>>>> ? console_unlock+0x41e/0x4e0 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d757c>] >>>>>> vprintk_emit+0x2fc/0x500 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff810d78ff>] >>>>>> vprintk_default+0x1f/0x30 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff81174dde>] printk+0x5d/0x74 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e71ad>] >>>>>> iscsi_target_locate_portal+0x62d/0x6f0 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e5100>] >>>>>> iscsi_target_login_thread+0x6f0/0xfc0 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff814e4a10>] ? >>>>>> iscsi_target_login_sess_out+0x250/0x250 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ? >>>>>> kthread_park+0x60/0x60 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8172150f>] >>>>>> ret_from_fork+0x3f/0x70 >>>>>> Oct 17 16:34:03 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ? >>>>>> kthread_park+0x60/0x60 >>>>>> >>>>>> I don't think this one is related, but it happened a couple of times: >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: INFO: rcu_sched >>>>>> self-detected stall on CPU >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: #01119-...: (5999 ticks this >>>>>> GP) idle=727/140000000000001/0 softirq=1346/1346 fqs=4990 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: #011 (t=6000 jiffies g=4295 >>>>>> c=4294 q=0) >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Task dump for CPU 19: >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: kworker/19:1 R running >>>>>> task 0 301 2 0x00000008 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Workqueue: >>>>>> events_power_efficient fb_flashcursor >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: ffff883f6009ca80 >>>>>> 00000000010a7cdd ffff883f7fcc3da8 ffffffff810ac8ff >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000013 >>>>>> ffffffff81adb680 ffff883f7fcc3dc0 ffffffff810af239 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: 0000000000000014 >>>>>> ffff883f7fcc3df0 ffffffff810e1cd0 ffff883f7fcd7b80 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: Call Trace: >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: <IRQ> [<ffffffff810ac8ff>] >>>>>> sched_show_task+0xaf/0x110 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810af239>] >>>>>> dump_cpu_task+0x39/0x40 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e1cd0>] >>>>>> rcu_dump_cpu_stacks+0x80/0xb0 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810e6100>] >>>>>> rcu_check_callbacks+0x540/0x820 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810afe11>] ? >>>>>> account_system_time+0x81/0x110 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa60>] ? >>>>>> tick_sched_do_timer+0x50/0x50 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810eb599>] >>>>>> update_process_times+0x39/0x60 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810fa815>] >>>>>> tick_sched_handle.isra.17+0x25/0x60 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810faa9d>] >>>>>> tick_sched_timer+0x3d/0x70 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec182>] >>>>>> __hrtimer_run_queues+0x102/0x290 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff810ec668>] >>>>>> hrtimer_interrupt+0xa8/0x1a0 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81052c65>] >>>>>> local_apic_timer_interrupt+0x35/0x60 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81723cbd>] >>>>>> smp_apic_timer_interrupt+0x3d/0x50 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81721f77>] >>>>>> apic_timer_interrupt+0x87/0x90 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: <EOI> [<ffffffff810d71be>] >>>>>> ? console_unlock+0x41e/0x4e0 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff813866ad>] >>>>>> fb_flashcursor+0x5d/0x140 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8138bc00>] ? >>>>>> bit_clear+0x110/0x110 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109740f>] >>>>>> process_one_work+0x14f/0x400 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097c84>] >>>>>> worker_thread+0x114/0x470 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8171cdda>] ? >>>>>> __schedule+0x34a/0x7f0 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff81097b70>] ? >>>>>> rescuer_thread+0x310/0x310 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d7c8>] kthread+0xd8/0xf0 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ? >>>>>> kthread_park+0x60/0x60 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8172150f>] >>>>>> ret_from_fork+0x3f/0x70 >>>>>> Oct 17 11:46:52 prv-0-12-sanstack kernel: [<ffffffff8109d6f0>] ? >>>>>> kthread_park+0x60/0x60 >>>>> >>>>> RCU self-detected schedule stalls typically mean some code is >>>>> monopolizing execution on a specific CPU for an extended period of time >>>>> (eg: endless loop), preventing normal RCU grace-period callbacks from >>>>> running in a timely manner. >>>>> >>>>> It's hard to tell without more log context and/or crashdump what was >>>>> going on here. >>>>> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html