On Sun, 2011-08-07 at 19:16 +0100, Chris Boot wrote: > Hi all, > > I've been using LIO for a little while, and thought I'd give what's > currently in linux-2.6.git (master) a shot. Unfortunately, I get a BUG > within seconds of an initiator connecting to one of my targets - this > isn't immediate but happens a couple of seconds in. The kernel log is below: > Greetings again Chris, Thank you for your detailed bug reported with the v4.1.0-rc1 code. My comments are below. > [ 109.599886] BUG: unable to handle kernel NULL pointer dereference at > 000000000000000c > [ 109.617486] IP: [<ffffffffa048ca5a>] > transport_allocate_data_tasks+0x1af/0x2cc [target_core_mod] > [ 109.636400] PGD 0 > [ 109.647538] Oops: 0000 [#1] SMP > [ 109.660125] CPU 1 > [ 109.662253] Modules linked in: iscsi_target_mod target_core_pscsi > target_core_file target_core_iblock target_core_mod configfs > ip6table_mangle iptable_mangle xt_DSCP xt_owner tun ip6table_filter > ip6_tables iptable_filter ip_tables cpufreq_stats cpufreq_userspace > cpufreq_conservative ebtable_nat ebtables x_tables cpufreq_powersave > ipmi_watchdog microcode nfsd nfs lockd fscache auth_rpcgss nfs_acl > sunrpc bridge stp ext4 jbd2 crc16 ipmi_si ipmi_devintf ipmi_msghandler > acpi_cpufreq mperf coretemp fuse crc32c_intel aesni_intel cryptd > aes_x86_64 aes_generic dm_crypt kvm_intel kvm snd_pcm snd_timer snd > soundcore snd_page_alloc processor evdev i2c_i801 ftdi_sio i2c_core > joydev usbserial video pcspkr button ext3 jbd mbcache dm_mod sg sd_mod > crc_t10dif usbhid hid ahci libahci 3w_9xxx libata ehci_hcd thermal fan > scsi_mod thermal_sys usbcore e1000e [last unloaded: scsi_wait_scan] > [ 109.674461] > [ 109.674464] Pid: 4703, comm: iscsi_trx Not tainted 3.0.0+ #1 Intel > Corporation S1200BTL/S1200BTL > [ 109.674468] RIP: 0010:[<ffffffffa048ca5a>] [<ffffffffa048ca5a>] > transport_allocate_data_tasks+0x1af/0x2cc [target_core_mod] > [ 109.674478] RSP: 0018:ffff880187ccdcd0 EFLAGS: 00010283 > [ 109.674480] RAX: 0000000000000000 RBX: ffff8801877dd280 RCX: > 0000000000000000 > [ 109.674483] RDX: 0000000000000000 RSI: 0000000000000020 RDI: > ffff88022dbd23e0 > [ 109.674485] RBP: ffff880230327c00 R08: 0000000000000000 R09: > 0000000000000040 > [ 109.674487] R10: ffff88018769cc00 R11: ffff8801877dd040 R12: > 0000000000000000 > [ 109.674490] R13: 0000000000000002 R14: ffff880189884400 R15: > 0000000000000100 > [ 109.674493] FS: 0000000000000000(0000) GS:ffff880236820000(0000) > knlGS:0000000000000000 > [ 109.674496] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 109.674498] CR2: 000000000000000c CR3: 0000000001604000 CR4: > 00000000000426e0 > [ 109.674500] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 109.674503] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 109.674506] Process iscsi_trx (pid: 4703, threadinfo > ffff880187ccc000, task ffff8801edb571e0) > [ 109.674508] Stack: > [ 109.674509] 0000000000000000 ffff880200000000 ffff88022dbd23e0 > 0000000000000100 > [ 109.674513] 0000000100000200 000000000062ddf0 0000004000000020 > ffff8801877dd40c > [ 109.674516] 0000004100000002 ffff8801877dd4a8 0000000000000000 > ffff8801877dd280 > [ 109.674520] Call Trace: > [ 109.674529] [<ffffffffa048f7d8>] ? > transport_generic_new_cmd+0x190/0x2f6 [target_core_mod] > [ 109.674537] [<ffffffffa04c413e>] ? iscsit_allocate_iovecs+0x24/0x4d > [iscsi_target_mod] > [ 109.674545] [<ffffffffa048f9a5>] ? > transport_handle_cdb_direct+0x67/0x8e [target_core_mod] > [ 109.674552] [<ffffffffa04c253b>] ? iscsit_sequence_cmd+0x5f/0xd8 > [iscsi_target_mod] > [ 109.674558] [<ffffffffa04c5b01>] ? > iscsi_target_rx_thread+0x7d2/0x1a71 [iscsi_target_mod] > [ 109.674564] [<ffffffff8100d027>] ? load_TLS+0x7/0xa > [ 109.674567] [<ffffffff8100d60f>] ? __switch_to+0xc1/0x20e > [ 109.674574] [<ffffffffa04c532f>] ? > iscsit_thread_get_cpumask+0x88/0x88 [iscsi_target_mod] > [ 109.674578] [<ffffffff8105e48d>] ? kthread+0x76/0x7e > [ 109.674582] [<ffffffff81334a14>] ? kernel_thread_helper+0x4/0x10 > [ 109.674585] [<ffffffff8105e417>] ? kthread_worker_fn+0x139/0x139 > [ 109.674588] [<ffffffff81334a10>] ? gs_change+0x13/0x13 > [ 109.674590] Code: 10 03 00 00 ff 90 c0 00 00 00 e9 1c 01 00 00 8b 75 > 10 48 89 c7 e8 97 7f d1 e0 8b 55 40 4c 8b 45 08 c7 44 24 30 00 00 00 00 > eb 45 > [ 109.674603] 39 54 24 0c 77 47 4c 89 c7 4c 89 e6 b9 08 00 00 00 f3 a5 4c > [ 109.674610] RIP [<ffffffffa048ca5a>] > transport_allocate_data_tasks+0x1af/0x2cc [target_core_mod] > [ 109.674618] RSP <ffff880187ccdcd0> > [ 109.674620] CR2: 000000000000000c > [ 109.674662] ---[ end trace 263ff2d45aa35d71 ]--- > [ 203.639102] iSCSI Login timeout on Network Portal > 2001:8b0:49:1:21e:67ff:fe14:69f4:3260 > > The crash happens here: > > (gdb) list *transport_allocate_data_tasks+0x1af > 0x11a5a is in transport_allocate_data_tasks > (drivers/target/target_core_transport.c:4142). > 4137 > 4138 task_size = task->task_size; > 4139 > 4140 /* Build new sgl, only up to task_size */ > 4141 for_each_sg(task->task_sg, sg, > task->task_sg_nents, count) { > 4142 if (cmd_sg->length > task_size) > 4143 break; > 4144 > 4145 *sg = *cmd_sg; > 4146 task_size -= cmd_sg->length; > > Unfortunately I'm no kernel expert, so I can't debug any further than this. It looks like we are running into some breakage with the recent v4.1 I/O task generation code changes and a certain SCSI CDB size combination being handled by iscsi-target (grover CC'ed). As this new code is currently lacking debug output, the easiet way to debug this with your current build would be making a wireshark packet capture from the initiator side so we can determine which CDB is actually trigger the bug here.. Proper pr_debug() out messages need to be added for these codepaths to handle this types of cases w/o a recompile, but that is going to take a little more time. ;) Also, the output from the following quick and dirty debug patch would be helpful to pinpoint the issue as well. Thanks for your bug report Chris! --nab diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index e9311d0..38e4435 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -4102,6 +4102,9 @@ static int transport_allocate_data_tasks( WARN_ON(cmd->data_length % sector_size); sectors = DIV_ROUND_UP(cmd->data_length, sector_size); task_count = DIV_ROUND_UP_SECTOR_T(sectors, dev_max_sectors); + + printk("cmd->data_length: %u, sectors: %llu, task_count: %d\n", + cmd->data_length, sectors, task_count); cmd_sg = sgl; for (i = 0; i < task_count; i++) { @@ -4115,6 +4118,9 @@ static int transport_allocate_data_tasks( task->task_lba = lba; task->task_sectors = min(sectors, dev_max_sectors); task->task_size = task->task_sectors * sector_size; + + printk("task[%d]: task_lba: %llu task_sectors: %u task_size: %u\n", + i, task->task_lba, task->task_sectors, task->task_size); cdb = dev->transport->get_cdb(task); BUG_ON(!cdb); @@ -4149,16 +4155,31 @@ static int transport_allocate_data_tasks( sg_init_table(task->task_sg, task->task_sg_nents); + printk("Setup task->task_sg: %p task->task_sg_nents: %d\n", + task->task_sg, task->task_sg_nents); + task_size = task->task_size; /* Build new sgl, only up to task_size */ for_each_sg(task->task_sg, sg, task->task_sg_nents, count) { + printk("sg[%d] Remaining task_size: %d cmd_sg: %p cmd_sg->length: %d\n", + count, task_size, cmd_sg, cmd_sg->length); + if (cmd_sg->length > task_size) break; *sg = *cmd_sg; task_size -= cmd_sg->length; + if (task_size > 0x00ffffff) { + printk("Bogus task_size: %d\n", task_size); + return -EINVAL; + } + cmd_sg = sg_next(cmd_sg); + if (!cmd_sg || !cmd_sg->length) { + printk("Bogus SGL at %p\n", cmd_sg); + return -EINVAL; + } } lba += task->task_sectors; -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html