Re: Test Failure with Header and Data Digest Enabled

jrepac@xxxxxxxxx · Thu, 2 Feb 2012 07:27:03 -0800 (PST)

Hi Nicholas,
Did you have a chance to look at the data digest failure?  I did confirm the list corruption is fixed with one of your recent check-ins.  This problem is easy to repro with the Emulex iSCSI initiator.  It fails almost immediately when a drive is initialized (Writes executed) by the OS.  

Wireshark traces taken from the target side confirm the computed data CRC32C for the packet reported as failing in /var/log/messages.  I actually pulled the data from the trace and confirmed the dissector's CRC32C computation.  It's fine.

Please let me know what you need from my side to help with this debug effort.

Thanks,
-Joe

________________________________
From: Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx>
To: jrepac@xxxxxxxxx 
Cc: "target-devel@xxxxxxxxxxxxxxx" <target-devel@xxxxxxxxxxxxxxx> 
Sent: Wednesday, December 21, 2011 2:32 PM
Subject: Re: Test Failure with Header and Data Digest Enabled

On Wed, 2011-12-21 at 08:39 -0800, jrepac@xxxxxxxxx wrote:
> A test failure was noted while testing the target on Windows 2008 x64
> R2 with Iometer.  The test setup is as follows:
> 
> Emulex iSCSI Initiator
> LIO target setup as a 9GB target with a single portal.
> Disk was partitioned as 8 Windows drives.
> 
> Iometer setup - 4 workers / 128 threads per worker / 32 kB transfers
> 50% read.
> 
> I ran the test without digest enabled and did not note any failures.
> This was tested over several hours.
> 
> 
> Both digests were enabled on the initiator and the test was reran.
> Kernel messages immediately started appearing on the screen indicating
> a failure.  I checked /var/log/messages and it appears the DataOut CRC
> check failed followed by some kind of recovery failure.  First set of
> failures from /var/log/messages are posted below.  I plan to simplify
> the test to "writes" only with one worker less threads and investigate
> further.
> 

Hi Joe,

Thank you for reporting this issue.  I'm going to try to reproduce this
list_del corruption issue by manually triggering an data-out CRC failure
in the same path below.

However, there may also be an issue with the Emulex iSCSI Initiator as
the offset + length below (Offset: 22080, Length: 1384) seems very
strange to me..

Also for reference, I'm not currently aware of any issues with the MSFT
software iSCSI initiator with Header + Data Digest enabled..

Would it be possible for you to re-test with the Windows 2008 x64
software initiator and see if you can reproduce so we can isolate the
Emulex iSCSI offload piece..?

Thanks,

--nab

> 
> MODE SENSE: unimplemented page/subpage: 0x1c/0x00
> ITT: 0x0066024e, Offset: 22080, Length: 1384, DataSN: 0x00000002, CRC32C DataDigest 0xb454d12f does not match computed 0xab2e5613
> Unable to recover from DataOUT CRC failure while ERL=0, closing session.
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
> Hardware name: VMware Virtual Platform
> list_del corruption. prev->next should be ffff880017b42a88, but was ffff880010e00040
> Modules linked in: binfmt_misc bluetooth rfkill tcp_lp iscsi_target_mod target_core_stgt scsi_tgt target_core_pscsi target_core_file target_core_iblock target_core_mod configfs fuse lockd ppdev i2c_piix4 microcode i2c_core parport_pc parport pcspkr vmxnet3 shpchp vmw_balloon sunrpc uinput vmw_pvscsi floppy [last unloaded: scsi_wait_scan]
> Pid: 3162, comm: kworker/0:2 Not tainted 3.2.0-rc4+ #4
> Call Trace:
>  [<ffffffff810579c2>] warn_slowpath_common+0x83/0x9b
>  [<ffffffffa00f4ace>] ? transport_init_se_cmd+0x104/0x104 [target_core_mod]
>  [<ffffffff81057a7d>] warn_slowpath_fmt+0x46/0x48
>  [<ffffffff814c1ddc>] ? _raw_spin_unlock_irqrestore+0x17/0x19
>  [<ffffffff81237671>] __list_del_entry+0x8d/0x98
>  [<ffffffff8123768a>] list_del+0xe/0x2d
>  [<ffffffffa00f3aaf>] transport_lun_remove_cmd+0x81/0xa1 [target_core_mod]
>  [<ffffffffa00f4dc3>] target_complete_ok_work+0x2f5/0x34e [target_core_mod]
>  [<ffffffff8122154c>] ? cfq_init_queue+0x3f8/0x3f8
>  [<ffffffffa00f4ace>] ? transport_init_se_cmd+0x104/0x104 [target_core_mod]
>  [<ffffffff8106e008>] process_one_work+0x176/0x2a9
>  [<ffffffff8106eb16>] worker_thread+0xda/0x15d
>  [<ffffffff8106ea3c>] ? manage_workers+0x176/0x176
>  [<ffffffff810721ff>] kthread+0x84/0x8c
>  [<ffffffff814ca674>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8107217b>] ? kthread_worker_fn+0x148/0x148
>  [<ffffffff814ca670>] ? gs_change+0x13/0x13
> ---[ end trace ec317e1a71e86303 ]---
> ITT: 0x007a0262, Offset: 30272, Length: 2496, DataSN: 0x00000005, CRC32C DataDigest 0xcb963d4b does not match computed 0x4e2f3e3a
> Unable to recover from DataOUT CRC failure while ERL=0, closing session.
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
> Hardware name: VMware Virtual Platform
> list_del corruption. prev->next should be ffff880017b43988, but was dead000000100100
> Modules linked in: binfmt_misc bluetooth rfkill tcp_lp iscsi_target_mod target_core_stgt scsi_tgt target_core_pscsi target_core_file target_core_iblock target_core_mod configfs fuse lockd ppdev i2c_piix4 microcode i2c_core parport_pc parport pcspkr vmxnet3 shpchp vmw_balloon sunrpc uinput vmw_pvscsi floppy [last unloaded: scsi_wait_scan]
> Pid: 3162, comm: kworker/0:2 Tainted: G        W    3.2.0-rc4+ #4
> Call Trace:
>  [<ffffffff810579c2>] warn_slowpath_common+0x83/0x9b
>  [<ffffffffa00f4ace>] ? transport_init_se_cmd+0x104/0x104 [target_core_mod]
>  [<ffffffff81057a7d>] warn_slowpath_fmt+0x46/0x48
>  [<ffffffff814c1ddc>] ? _raw_spin_unlock_irqrestore+0x17/0x19
>  [<ffffffff81237671>] __list_del_entry+0x8d/0x98
>  [<ffffffff8123768a>] list_del+0xe/0x2d
>  [<ffffffffa00f3aaf>] transport_lun_remove_cmd+0x81/0xa1 [target_core_mod]
>  [<ffffffffa00f4dc3>] target_complete_ok_work+0x2f5/0x34e [target_core_mod]
>  [<ffffffff814c0651>] ? __schedule+0x616/0x644
>  [<ffffffffa00f4ace>] ? transport_init_se_cmd+0x104/0x104 [target_core_mod]
>  [<ffffffff8106e008>] process_one_work+0x176/0x2a9
>  [<ffffffff8106eb16>] worker_thread+0xda/0x15d
>  [<ffffffff8106ea3c>] ? manage_workers+0x176/0x176
>  [<ffffffff810721ff>] kthread+0x84/0x8c
>  [<ffffffff814ca674>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8107217b>] ? kthread_worker_fn+0x148/0x148
>  [<ffffffff814ca670>] ? gs_change+0x13/0x13
> ---[ end trace ec317e1a71e86304 ]---
> ------------[ cut here ]------------
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html