Sorry - forgot to mention this is a Red Hat 5.3 environment. ----- Original Message ----- From: martin.montreuil@xxxxxxxxxx To: stgt@xxxxxxxxxxxxxxx Sent: Friday, September 25, 2009 10:17:49 AM GMT -05:00 US/Canada Eastern Subject: tgt V0.9.7 & V0.9.8 - getting tgtd segfault error 4 Under V0.9.7 received: Sep 24 05:12:57 storageserver kernel: tgtd[31665]: segfault at 0000555e4ee57d90 rip 0000003dc14715a8 rsp 00007fff7f899ce0 error 4 Upgraded to V0.9.8 and: Sep 25 01:37:02 storageserver kernel: tgtd[31609]: segfault at fffffffffffffff0 rip 0000000000405ae4 rsp 00007fffc7fe3940 error 4 Not repeatable but happens generally within 24 hours. There are 143 disks being served in this configuration and multipathd is in use. This is a new installation. targets.conf looks like: <target iqn.2009-06.crrel:storageserver.disks> backing-store /dev/mapper/mpath1 backing-store /dev/mapper/mpath2 ... backing-store /dev/mapper/mpath143 allow-in-use yes </target> Under V0.9.8 we are also seeing something new - BUG: soft lockup's (below) just prior to the segfault. Any suggestions? Thanks! Marty Sep 25 01:32:39 storageserver tgtd: conn_close(130) Forcing release of tx task 0x2aae7d692820 b000003b 1 Sep 25 01:32:57 storageserver kernel: BUG: soft lockup - CPU#1 stuck for 10s! [tgtd:32036] Sep 25 01:32:57 storageserver kernel: CPU 1: Sep 25 01:32:57 storageserver kernel: Modules linked in: hangcheck_timer autofs4 ipmi_devintf ipmi_si ipmi_msghandler hidp l2cap bluetooth sunrpc bonding cpufreq_ondemand powernow_k8 freq_table mptctl(U) dm_mirror dm_round_robin dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev sg ixgbe forcedeth i2c_nforce2 i2c_core pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache usb_storage mptfc(U) scsi_transport_fc mptspi(U) scsi_transport_spi shpchp mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Sep 25 01:32:57 storageserver kernel: Pid: 32036, comm: tgtd Tainted: G 2.6.18-128.el5 #1 Sep 25 01:32:57 storageserver kernel: RIP: 0010:[<ffffffff80064cb4>] [<ffffffff80064cb4>] .text.lock.spinlock+0x2/0x30 Sep 25 01:32:57 storageserver kernel: RSP: 0018:ffff81080b2fd880 EFLAGS: 00000286 Sep 25 01:32:57 storageserver kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000c0000100 Sep 25 01:32:57 storageserver kernel: RDX: ffff81081ec53d69 RSI: 0000000000000001 RDI: ffff81081ec53d68 Sep 25 01:32:57 storageserver kernel: RBP: ffff810824fa5870 R08: ffff81080b2fc000 R09: 0000000000000286 Sep 25 01:32:57 storageserver kernel: R10: ffff81041ecb2040 R11: ffff810827e3d080 R12: ffff81081c06a348 Sep 25 01:32:57 storageserver kernel: R13: ffff810822d69a68 R14: ffffffff80063097 R15: ffff81080b2fd8a8 Sep 25 01:32:57 storageserver kernel: FS: 00002aab483a6940(0000) GS:ffff81010e959440(0000) knlGS:00000000e347eb90 Sep 25 01:32:57 storageserver kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 25 01:32:57 storageserver kernel: CR2: 0000003dc1499a50 CR3: 000000040ec65000 CR4: 00000000000006e0 Sep 25 01:32:57 storageserver kernel: Sep 25 01:32:57 storageserver kernel: Call Trace: Sep 25 01:32:57 storageserver kernel: [<ffffffff80021a5f>] page_lock_anon_vma+0x1d/0x26 Sep 25 01:32:57 storageserver kernel: [<ffffffff8003ba76>] page_referenced+0x43/0xe4 Sep 25 01:32:57 storageserver kernel: [<ffffffff800c72b6>] shrink_inactive_list+0x191/0x7f9 Sep 25 01:32:57 storageserver kernel: [<ffffffff800cbded>] page_referenced_one+0x61/0xd0 Sep 25 01:32:57 storageserver kernel: [<ffffffff80064cb7>] .text.lock.spinlock+0x5/0x30 Sep 25 01:32:57 storageserver kernel: [<ffffffff8003baa1>] page_referenced+0x6e/0xe4 Sep 25 01:32:57 storageserver kernel: [<ffffffff80047ab0>] __pagevec_release+0x19/0x22 Sep 25 01:32:57 storageserver kernel: [<ffffffff800c7004>] shrink_active_list+0x416/0x426 Sep 25 01:32:57 storageserver kernel: [<ffffffff80012d02>] shrink_zone+0xf6/0x11c Sep 25 01:32:57 storageserver kernel: [<ffffffff800c801b>] try_to_free_pages+0x197/0x2b9 Sep 25 01:32:57 storageserver kernel: [<ffffffff8000f271>] __alloc_pages+0x1cb/0x2ce Sep 25 01:32:57 storageserver kernel: [<ffffffff8000fb8a>] generic_file_buffered_write+0x1b0/0x6d3 Sep 25 01:32:57 storageserver kernel: [<ffffffff80016196>] __generic_file_aio_write_nolock+0x36c/0x3b8 Sep 25 01:32:57 storageserver kernel: [<ffffffff8003dd13>] do_futex+0x282/0xc3f Sep 25 01:32:57 storageserver kernel: [<ffffffff800c2ce5>] generic_file_aio_write_nolock+0x20/0x6c Sep 25 01:32:57 storageserver kernel: [<ffffffff800c30b1>] generic_file_write_nolock+0x8f/0xa8 Sep 25 01:32:57 storageserver kernel: [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e Sep 25 01:32:57 storageserver kernel: [<ffffffff80063097>] thread_return+0x62/0xfe Sep 25 01:32:57 storageserver kernel: [<ffffffff800df545>] blkdev_file_write+0x1a/0x1f Sep 25 01:32:57 storageserver kernel: [<ffffffff8001659e>] vfs_write+0xce/0x174 Sep 25 01:32:57 storageserver kernel: [<ffffffff80043876>] sys_pwrite64+0x50/0x70 Sep 25 01:32:57 storageserver kernel: [<ffffffff8005d229>] tracesys+0x71/0xe0 Sep 25 01:32:57 storageserver kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Sep 25 01:32:57 storageserver kernel: Sep 25 01:36:29 storageserver kernel: BUG: soft lockup - CPU#2 stuck for 10s! [kswapd0:651] Sep 25 01:36:29 storageserver kernel: CPU 2: Sep 25 01:36:29 storageserver kernel: Modules linked in: hangcheck_timer autofs4 ipmi_devintf ipmi_si ipmi_msghandler hidp l2cap bluetooth sunrpc bonding cpufreq_ondemand powernow_k8 freq_table mptctl(U) dm_mirror dm_round_robin dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev sg ixgbe forcedeth i2c_nforce2 i2c_core pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache usb_storage mptfc(U) scsi_transport_fc mptspi(U) scsi_transport_spi shpchp mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Sep 25 01:36:29 storageserver kernel: Pid: 651, comm: kswapd0 Tainted: G 2.6.18-128.el5 #1 Sep 25 01:36:29 storageserver kernel: RIP: 0010:[<ffffffff80064cb4>] [<ffffffff80064cb4>] .text.lock.spinlock+0x2/0x30 Sep 25 01:36:29 storageserver kernel: RSP: 0018:ffff810827569b58 EFLAGS: 00000286 Sep 25 01:36:29 storageserver kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 Sep 25 01:36:29 storageserver kernel: RDX: ffff81081ec53d69 RSI: 0000000000000001 RDI: ffff81081ec53d68 Sep 25 01:36:29 storageserver kernel: RBP: 000000000000003e R08: ffff810827568000 R09: 0000000000000286 Sep 25 01:36:29 storageserver kernel: R10: ffff81041ecb2040 R11: 0000000000000000 R12: ffff81041ecb2040 Sep 25 01:36:29 storageserver kernel: R13: ffffffff80063097 R14: ffff810827569b80 R15: ffff81041ecb2040 Sep 25 01:36:29 storageserver kernel: FS: 00002ac5e2ad8a10(0000) GS:ffff81010e9591c0(0000) knlGS:00000000e33fdb90 Sep 25 01:36:29 storageserver kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Sep 25 01:36:29 storageserver kernel: CR2: 0000003dc1499a50 CR3: 0000000000201000 CR4: 00000000000006e0 Sep 25 01:36:29 storageserver kernel: Sep 25 01:36:29 storageserver kernel: Call Trace: Sep 25 01:36:29 storageserver kernel: [<ffffffff80021a5f>] page_lock_anon_vma+0x1d/0x26 Sep 25 01:36:29 storageserver kernel: [<ffffffff8003ba76>] page_referenced+0x43/0xe4 Sep 25 01:36:29 storageserver kernel: [<ffffffff800c72b6>] shrink_inactive_list+0x191/0x7f9 Sep 25 01:36:29 storageserver kernel: [<ffffffff800cbded>] page_referenced_one+0x61/0xd0 Sep 25 01:36:29 storageserver kernel: [<ffffffff800cbd8e>] page_referenced_one+0x2/0xd0 Sep 25 01:36:29 storageserver kernel: [<ffffffff8003baa1>] page_referenced+0x6e/0xe4 Sep 25 01:36:29 storageserver kernel: [<ffffffff80047ab0>] __pagevec_release+0x19/0x22 Sep 25 01:36:29 storageserver kernel: [<ffffffff800c7004>] shrink_active_list+0x416/0x426 Sep 25 01:36:29 storageserver kernel: [<ffffffff80012d02>] shrink_zone+0xf6/0x11c Sep 25 01:36:29 storageserver kernel: [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e Sep 25 01:36:29 storageserver kernel: [<ffffffff8005778a>] kswapd+0x337/0x45a Sep 25 01:36:29 storageserver kernel: [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e Sep 25 01:36:29 storageserver kernel: [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 Sep 25 01:36:29 storageserver kernel: [<ffffffff80057453>] kswapd+0x0/0x45a Sep 25 01:36:29 storageserver kernel: [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 Sep 25 01:36:29 storageserver kernel: [<ffffffff80032360>] kthread+0xfe/0x132 Sep 25 01:36:29 storageserver kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Sep 25 01:36:29 storageserver kernel: [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 Sep 25 01:36:29 storageserver kernel: [<ffffffff80032262>] kthread+0x0/0x132 Sep 25 01:36:29 storageserver kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Sep 25 01:36:29 storageserver kernel: Sep 25 01:36:34 storageserver tgtd: conn_close(101) connection closed, 0x2aac62810018 6 Sep 25 01:36:34 storageserver tgtd: conn_close(107) sesson 0x2aaccb010130 1 Sep 25 01:36:34 storageserver tgtd: conn_close(101) connection closed, 0x2aac705106b8 1 Sep 25 01:36:34 storageserver tgtd: conn_close(107) sesson 0x2ab383692e40 1 Sep 25 01:36:34 storageserver tgtd: conn_close(101) connection closed, 0x2aac47610678 1 Sep 25 01:36:34 storageserver tgtd: conn_close(107) sesson 0x2aac476108d0 1 Sep 25 01:36:34 storageserver tgtd: conn_close(101) connection closed, 0x2aaf60a92d88 120 Sep 25 01:36:34 storageserver tgtd: conn_close(107) sesson 0x2aac98110a30 1 -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html