Re: BUG: unable to handle kernel NULL pointer dereference in transport_allocate_data_tasks+0x1af/0x2cc [target_core_mod]

Chris Boot <bootc@xxxxxxxxx> · Mon, 08 Aug 2011 22:57:41 +0100

On 08/08/11 21:42, Nicholas A. Bellinger wrote:
On Sun, 2011-08-07 at 19:16 +0100, Chris Boot wrote:
Hi all,

I've been using LIO for a little while, and thought I'd give what's
currently in linux-2.6.git (master) a shot. Unfortunately, I get a BUG
within seconds of an initiator connecting to one of my targets - this
isn't immediate but happens a couple of seconds in. The kernel log is below:

Greetings again Chris,

Thank you for your detailed bug reported with the v4.1.0-rc1 code.  My
comments are below.

[  109.599886] BUG: unable to handle kernel NULL pointer dereference at
000000000000000c
[  109.617486] IP: [<ffffffffa048ca5a>]
transport_allocate_data_tasks+0x1af/0x2cc [target_core_mod]
[  109.636400] PGD 0
[  109.647538] Oops: 0000 [#1] SMP
[  109.660125] CPU 1
[  109.662253] Modules linked in: iscsi_target_mod target_core_pscsi
target_core_file target_core_iblock target_core_mod configfs
ip6table_mangle iptable_mangle xt_DSCP xt_owner tun ip6table_filter
ip6_tables iptable_filter ip_tables cpufreq_stats cpufreq_userspace
cpufreq_conservative ebtable_nat ebtables x_tables cpufreq_powersave
ipmi_watchdog microcode nfsd nfs lockd fscache auth_rpcgss nfs_acl
sunrpc bridge stp ext4 jbd2 crc16 ipmi_si ipmi_devintf ipmi_msghandler
acpi_cpufreq mperf coretemp fuse crc32c_intel aesni_intel cryptd
aes_x86_64 aes_generic dm_crypt kvm_intel kvm snd_pcm snd_timer snd
soundcore snd_page_alloc processor evdev i2c_i801 ftdi_sio i2c_core
joydev usbserial video pcspkr button ext3 jbd mbcache dm_mod sg sd_mod
crc_t10dif usbhid hid ahci libahci 3w_9xxx libata ehci_hcd thermal fan
scsi_mod thermal_sys usbcore e1000e [last unloaded: scsi_wait_scan]
[  109.674461]
[  109.674464] Pid: 4703, comm: iscsi_trx Not tainted 3.0.0+ #1 Intel
Corporation S1200BTL/S1200BTL
[  109.674468] RIP: 0010:[<ffffffffa048ca5a>]  [<ffffffffa048ca5a>]
transport_allocate_data_tasks+0x1af/0x2cc [target_core_mod]
[  109.674478] RSP: 0018:ffff880187ccdcd0  EFLAGS: 00010283
[  109.674480] RAX: 0000000000000000 RBX: ffff8801877dd280 RCX:
0000000000000000
[  109.674483] RDX: 0000000000000000 RSI: 0000000000000020 RDI:
ffff88022dbd23e0
[  109.674485] RBP: ffff880230327c00 R08: 0000000000000000 R09:
0000000000000040
[  109.674487] R10: ffff88018769cc00 R11: ffff8801877dd040 R12:
0000000000000000
[  109.674490] R13: 0000000000000002 R14: ffff880189884400 R15:
0000000000000100
[  109.674493] FS:  0000000000000000(0000) GS:ffff880236820000(0000)
knlGS:0000000000000000
[  109.674496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  109.674498] CR2: 000000000000000c CR3: 0000000001604000 CR4:
00000000000426e0
[  109.674500] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  109.674503] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  109.674506] Process iscsi_trx (pid: 4703, threadinfo
ffff880187ccc000, task ffff8801edb571e0)
[  109.674508] Stack:
[  109.674509]  0000000000000000 ffff880200000000 ffff88022dbd23e0
0000000000000100
[  109.674513]  0000000100000200 000000000062ddf0 0000004000000020
ffff8801877dd40c
[  109.674516]  0000004100000002 ffff8801877dd4a8 0000000000000000
ffff8801877dd280
[  109.674520] Call Trace:
[  109.674529]  [<ffffffffa048f7d8>] ?
transport_generic_new_cmd+0x190/0x2f6 [target_core_mod]
[  109.674537]  [<ffffffffa04c413e>] ? iscsit_allocate_iovecs+0x24/0x4d
[iscsi_target_mod]
[  109.674545]  [<ffffffffa048f9a5>] ?
transport_handle_cdb_direct+0x67/0x8e [target_core_mod]
[  109.674552]  [<ffffffffa04c253b>] ? iscsit_sequence_cmd+0x5f/0xd8
[iscsi_target_mod]
[  109.674558]  [<ffffffffa04c5b01>] ?
iscsi_target_rx_thread+0x7d2/0x1a71 [iscsi_target_mod]
[  109.674564]  [<ffffffff8100d027>] ? load_TLS+0x7/0xa
[  109.674567]  [<ffffffff8100d60f>] ? __switch_to+0xc1/0x20e
[  109.674574]  [<ffffffffa04c532f>] ?
iscsit_thread_get_cpumask+0x88/0x88 [iscsi_target_mod]
[  109.674578]  [<ffffffff8105e48d>] ? kthread+0x76/0x7e
[  109.674582]  [<ffffffff81334a14>] ? kernel_thread_helper+0x4/0x10
[  109.674585]  [<ffffffff8105e417>] ? kthread_worker_fn+0x139/0x139
[  109.674588]  [<ffffffff81334a10>] ? gs_change+0x13/0x13
[  109.674590] Code: 10 03 00 00 ff 90 c0 00 00 00 e9 1c 01 00 00 8b 75
10 48 89 c7 e8 97 7f d1 e0 8b 55 40 4c 8b 45 08 c7 44 24 30 00 00 00 00
eb 45
[  109.674603]  39 54 24 0c 77 47 4c 89 c7 4c 89 e6 b9 08 00 00 00 f3 a5 4c
[  109.674610] RIP  [<ffffffffa048ca5a>]
transport_allocate_data_tasks+0x1af/0x2cc [target_core_mod]
[  109.674618]  RSP<ffff880187ccdcd0>
[  109.674620] CR2: 000000000000000c
[  109.674662] ---[ end trace 263ff2d45aa35d71 ]---
[  203.639102] iSCSI Login timeout on Network Portal
2001:8b0:49:1:21e:67ff:fe14:69f4:3260

The crash happens here:

(gdb) list *transport_allocate_data_tasks+0x1af
0x11a5a is in transport_allocate_data_tasks
(drivers/target/target_core_transport.c:4142).
4137
4138                    task_size = task->task_size;
4139
4140                    /* Build new sgl, only up to task_size */
4141                    for_each_sg(task->task_sg, sg,
task->task_sg_nents, count) {
4142                            if (cmd_sg->length>  task_size)
4143                                    break;
4144
4145                            *sg = *cmd_sg;
4146                            task_size -= cmd_sg->length;

Unfortunately I'm no kernel expert, so I can't debug any further than this.
It looks like we are running into some breakage with the recent v4.1 I/O
task generation code changes and a certain SCSI CDB size combination
being handled by iscsi-target (grover CC'ed).

As this new code is currently lacking debug output, the easiet way to
debug this with your current build would be making a wireshark packet
capture from the initiator side so we can determine which CDB is
actually trigger the bug here..

Proper pr_debug() out messages need to be added for these codepaths to
handle this types of cases w/o a recompile, but that is going to take a
little more time.  ;)

Also, the output from the following quick and dirty debug patch would be
helpful to pinpoint the issue as well.

Thanks for your bug report Chris!

--nab
[snip]

Nicholas,

First of all, I should mention I tried to replicate the BUG on a 
different KVM virtual machine, with the same kernel and accessing the 
same storage with a virtio-block device. I couldn't replicate the bug on 
the VM, though it's very easy to trigger on my iSCSI box and KVM host.

With your patch applied, I get different behaviour but no BUG. Without 
further ado, here are the messages logged on the server:

[  267.452296] bio: create slab <bio-1> at 1
[  278.361145] cmd->data_length: 4096, sectors: 8, task_count: 1
[  278.377229] task[0]: task_lba: 0 task_sectors: 8 task_size: 4096
[  278.393720] Setup task->task_sg: ffff8802253f88e0 task->task_sg_nents: 1
[  278.410983] sg[0] Remaining task_size: 4096 cmd_sg: ffff8802253f88c0 
cmd_sg->length: 4096
[  278.429978] Bogus SGL at           (null)
[  339.991562] cmd->data_length: 4096, sectors: 8, task_count: 1
[  340.007700] task[0]: task_lba: 0 task_sectors: 8 task_size: 4096
[  340.024201] Setup task->task_sg: ffff8802253f8ce0 task->task_sg_nents: 1
[  340.041731] sg[0] Remaining task_size: 4096 cmd_sg: ffff8802253f8c80 
cmd_sg->length: 4096
[  340.061113] Bogus SGL at           (null)
[  401.976353] cmd->data_length: 4096, sectors: 8, task_count: 1
[  401.993067] task[0]: task_lba: 0 task_sectors: 8 task_size: 4096
[  402.010057] Setup task->task_sg: ffff88021f448760 task->task_sg_nents: 1
[  402.027834] sg[0] Remaining task_size: 4096 cmd_sg: ffff8802253f8b40 
cmd_sg->length: 4096
[  402.047406] Bogus SGL at           (null)

Note that this repeats at regular intervals. The target logs in and the 
error triggers as soon as it tries to read the partition table. The 
initiator (in this case Open-iSCSI rather than Windows which I have been 
testing with before) just hangs, I'm assuming it's constantly re-trying 
the read.

Thanks for taking a look!

Chris

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html