Re: 2.6.36-rc6 BUG at drivers/scsi/scsi_lib.c:1113

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Thu, 30 Sep 2010 18:34:58 -0700

On Thu, 2010-09-30 at 17:10 -0400, George Spelvin wrote:
> Supposedly 2c7d46ec192e4f2b350f67a0e185b9bce646cd6b in Linus' tree
> fixes thus bug, but I can trigger it reliably by asking for a check
> of a RAID-1 volume.  ("echo check > /sys/block/md6/md/sync_action")
> 
> This particular volume is a 10-way (!) RAID-1, but I suspect smaller
> numbers will work.
> 
> The same BUG is triggered on a RAID-10 thread for a volume that shares
> the same physical drives.  Note that a check of *that* volume (md7)
> does not trigger the problem.
> 
> 2.6.35 does NOT exhibit this problem.
> 
> Here are two full kernel logs, boot to crash.  There are some local patches,
> but they are quite unrelated.  (Mostly PPS device changes.)
> 
> 
> 18:56:20: klogd 1.5.0#6, log source = /proc/kmsg started.
> 18:56:20: Linux version 2.6.36-rc6-00070-g44c064a ($USER@$HOST) (gcc version 4.4.5 (Debian 4.4.4-17) ) #308 SMP Thu Sep 30 14:14:07 EDT 2010
> 18:56:20: Command line: auto BOOT_IMAGE=2.6 ro root=907 libata.fua=1 tsc_khz=2500210 amd64_edac_mod.ecc_enable_override=1 acpi_enforce_resources=lax k10temp.force=1

Ok, so libata.fua=1 is set here..

<SNIP>

> 20:44:45: ata4.00: SB600 AHCI: limiting to 255 sectors per cmd
> 20:44:45: ata4: EH complete
> 20:44:45: ata5.00: configured for UDMA/133
> 20:44:45: ata5: EH complete
> 20:44:45: ata6.00: configured for UDMA/133
> 20:44:45: Adding 7968124k swap on /dev/md8.  Priority:0 extents:1 across:7968124k 
> 20:44:45: EXT4-fs (md7): re-mounted. Opts: journal_checksum,journal_async_commit,delalloc,auto_da_alloc
> 20:44:45: EXT4-fs (md10): mounted filesystem with ordered data mode. Opts: journal_checksum,journal_async_commit,delalloc,auto_da_alloc
> 20:44:45: alg: No test for digest_null (digest_null-generic)
> 20:44:45: alg: No test for compress_null (compress_null-generic)
> 20:45:14: r8169 0000:03:00.0: eth0: link up
> 20:47:59: md: data-check of RAID array md6
> 20:47:59: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> 20:47:59: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> 20:47:59: md: using 128k window, over a total of 1166912 blocks.
> 20:47:59: ata1.00: WARNING: zero len r/w req

Ok, it looks like MD is sending down a zero-length CDB but still
signaling DMA_FROM_DEVICE or DMA_TO_DEVICE for the struct scsi_cmnd I/O
descriptor being translated ATA in drivers/ata/libata-scsi.c:
ata_scsi_translate().

> 20:47:59: ------------[ cut here ]------------
> 20:47:59: kernel BUG at drivers/scsi/scsi_lib.c:1113!

This BUG_ON() is getting triggered in scsi_setup_fs_cmnd() which is
called for each struct request of type REQ_TYPE_FS:

        /*
         * Filesystem requests must transfer data.
         */
        BUG_ON(!req->nr_phys_segments);

I am guessing these are related to the barrier + FUA changes in .36
that with libata.fua=1 is triggering drivers/md/raid1.c to send down a
zero-length barrier request in the form of a a struct request with the
REQ WRITE bit set:

>From a quick look drivers/md/md.c:submit_barriers():

	<SNIP>
         bi = bio_alloc(GFP_KERNEL, 0);
         bi->bi_end_io = md_end_barrier;
         bi->bi_private = rdev;
         bi->bi_bdev = rdev->bdev;
         atomic_inc(&mddev->flush_pending);
         submit_bio(WRITE_BARRIER, bi);
	<SNIP>

I wonder if this could be possibly the root culprit..?  That is
submit_bio(WRITE_BARRER, ..) setting:

	struct request->rw_flags = REQ_WRITE | REQ_SYNC 

and getting into translated into a SYCHRONIZE_CACHE CDB in drivers/scsi
and sets

	struct scsi_cmnd->sc_data_direction = DMA_TO_DEVICE;

with scsi_bufflen(sc) returning a zero length and finally hitting the
!request->nr_phys_segments hitting the BUG_ON() in scsi_setup_fs_cmnd()
show above..

I have not been able to pinpoint the actual breakage just yet, but
hopefully Jens and Tejun can have a look soon and add their comments so
this can be resolved for .36-rc7.

Thanks for reporting!

--nab

> 20:47:59: invalid opcode: 0000 [#1] SMP 
> 20:47:59: last sysfs file: /sys/devices/virtual/block/md6/md/sync_action
> 20:47:59: CPU 2 
> 20:47:59: Modules linked in: ctr twofish_generic twofish_x86_64 twofish_common serpent xcbc sha512_generic sha256_generic crypto_null ipg 8250_pci 8250_pnp 8250 serial_core
> 20:47:59: 
> 20:47:59: Pid: 9537, comm: md6_resync Not tainted 2.6.36-rc6-00070-g44c064a #308 MS-7376/MS-7376
> 20:47:59: RIP: 0010:[<ffffffff812ab746>]  [<ffffffff812ab746>] scsi_setup_fs_cmnd+0x4e/0xbb
> 20:47:59: RSP: 0018:ffff880218de3ad0  EFLAGS: 00010046
> 20:47:59: RAX: 0000000000000000 RBX: ffff88021a8b4b88 RCX: ffff88021f3c2320
> 20:47:59: RDX: 0000000000000000 RSI: ffff88021a8b4b88 RDI: ffff88021f513000
> 20:47:59: RBP: ffff880218de3ae0 R08: 00000000000003e8 R09: ffff880218de3ae0
> 20:47:59: R10: 0000000000000000 R11: ffff880218de3ae0 R12: ffff88021f513000
> 20:47:59: R13: ffff880218de3b58 R14: 0000000000000000 R15: ffff88021f512c00
> 20:47:59: FS:  00007fdf8b89f700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
> 20:47:59: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> 20:47:59: CR2: 0000003a10c1e8f0 CR3: 000000000164a000 CR4: 00000000000006e0
> 20:47:59: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 20:47:59: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 20:47:59: Process md6_resync (pid: 9537, threadinfo ffff880218de2000, task ffff88021c53ac00)
> 20:47:59: Stack:
> 20:47:59:  ffff88021a8b4b88 ffff88021f513000 ffff880218de3b40 ffffffff812b171d
> 20:47:59: <0> ffff88021f604068 ffff88021f6040b8 000000001f604158 ffff88021f3c2320
> 20:47:59: <0> ffff880218de3b40 ffff88021a8b4b88 ffff88021f3c2320 ffff880218de3b58
> 20:47:59: Call Trace:
> 20:47:59:  [<ffffffff812b171d>] sd_prep_fn+0x241/0x868
> 20:47:59:  [<ffffffff81160487>] blk_peek_request+0xb3/0x177
> 20:47:59:  [<ffffffff812aae96>] scsi_request_fn+0x84/0x424
> 20:47:59:  [<ffffffff81160d9c>] __generic_unplug_device+0x32/0x37
> 20:47:59:  [<ffffffff81160dcc>] generic_unplug_device+0x2b/0x3a
> 20:47:59:  [<ffffffff8115f0de>] blk_unplug+0x12/0x14
> 20:47:59:  [<ffffffff81306b31>] unplug_slaves+0x69/0x9f
> 20:47:59:  [<ffffffff81306b7f>] raid1_unplug+0x18/0x28
> 20:47:59:  [<ffffffff8115f0de>] blk_unplug+0x12/0x14
> 20:47:59:  [<ffffffff8131aa72>] md_unplug+0x1d/0x33
> 20:47:59:  [<ffffffff8131b3d4>] md_do_sync+0x94c/0xba6
> 20:47:59:  [<ffffffff8102f88f>] ? finish_task_switch+0x34/0x74
> 20:47:59:  [<ffffffff8131ba3a>] md_thread+0xf6/0x114
> 20:47:59:  [<ffffffff8131b944>] ? md_thread+0x0/0x114
> 20:47:59:  [<ffffffff8131b944>] ? md_thread+0x0/0x114
> 20:47:59:  [<ffffffff81049e4f>] kthread+0x7d/0x85
> 20:47:59:  [<ffffffff81002c94>] kernel_thread_helper+0x4/0x10
> 20:47:59:  [<ffffffff81049dd2>] ? kthread+0x0/0x85
> 20:47:59:  [<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10
> 20:47:59: Code: 85 c0 74 1d 48 8b 00 48 85 c0 74 15 48 8b 40 48 48 85 c0 74 0c 48 89 de 4c 89 e7 ff d0 85 c0 75 72 66 83 bb c0 00 00 00 00 75 04 <0f> 0b eb fe 48 8b 93 c8 00 00 00 48 85 d2 75 21 be 20 00 00 00 
> 20:47:59: RIP  [<ffffffff812ab746>] scsi_setup_fs_cmnd+0x4e/0xbb
> 20:47:59:  RSP <ffff880218de3ad0>
> 20:47:59: ---[ end trace 1e250084e0863623 ]---
> 20:48:01: ------------[ cut here ]------------
> 20:48:01: kernel BUG at drivers/scsi/scsi_lib.c:1113!
> 20:48:01: invalid opcode: 0000 [#2] SMP 
> 20:48:01: last sysfs file: /sys/devices/virtual/block/md6/md/sync_action
> 20:48:01: CPU 3 
> 20:48:01: Modules linked in: ctr twofish_generic twofish_x86_64 twofish_common serpent xcbc sha512_generic sha256_generic crypto_null ipg 8250_pci 8250_pnp 8250 serial_core
> 20:48:01: 
> 20:48:01: Pid: 947, comm: md7_raid10 Tainted: G      D     2.6.36-rc6-00070-g44c064a #308 MS-7376/MS-7376
> 20:48:01: RIP: 0010:[<ffffffff812ab746>]  [<ffffffff812ab746>] scsi_setup_fs_cmnd+0x4e/0xbb
> 20:48:01: RSP: 0018:ffff88021f4b1930  EFLAGS: 00010046
> 20:48:01: RAX: 0000000000000000 RBX: ffff88021a8b4f60 RCX: 0000000000000000
> 20:48:01: RDX: 0000000000000000 RSI: ffff88021a8b4f60 RDI: ffff88021f3ca400
> 20:48:01: RBP: ffff88021f4b1940 R08: 0000000000052800 R09: 0000000000000000
> 20:48:01: R10: ffff88021f4b19d0 R11: ffff88021f4b19f0 R12: ffff88021f3ca400
> 20:48:01: R13: ffff88021f4b19b8 R14: 0000000000000000 R15: ffff88021f3cbc00
> 20:48:01: FS:  00007fce18ce5720(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
> 20:48:01: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> 20:48:01: CR2: 0000003a10c6e5a0 CR3: 000000021b00c000 CR4: 00000000000006e0
> 20:48:01: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 20:48:01: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 20:48:01: Process md7_raid10 (pid: 947, threadinfo ffff88021f4b0000, task ffff88021f384200)
> 20:48:01: Stack:
> 20:48:01:  ffff88021a8b4f60 ffff88021f3ca400 ffff88021f4b19a0 ffffffff812b171d
> 20:48:01: <0> ffff88021f4b1970 ffffffff8115eb7e 000000001f597008 ffff88021e821a58
> 20:48:01: <0> ffff88021f4b1990 ffff88021a8b4f60 ffff88021e821a58 ffff88021f4b19b8
> 20:48:01: Call Trace:
> 20:48:01:  [<ffffffff812b171d>] sd_prep_fn+0x241/0x868
> 20:48:01:  [<ffffffff8115eb7e>] ? elv_rb_del+0x30/0x49
> 20:48:01:  [<ffffffff81160487>] blk_peek_request+0xb3/0x177
> 20:48:01:  [<ffffffff812aae96>] scsi_request_fn+0x84/0x424
> 20:48:01:  [<ffffffff8103f279>] ? del_timer+0x7f/0x89
> 20:48:01:  [<ffffffff81160c18>] __blk_run_queue+0x3f/0x72
> 20:48:01:  [<ffffffff8115e831>] elv_insert+0x80/0x1a8
> 20:48:01:  [<ffffffff8115e9f1>] __elv_add_request+0x98/0x9f
> 20:48:01:  [<ffffffff8116174e>] __make_request+0x34a/0x3cd
> 20:48:01:  [<ffffffff8115feda>] generic_make_request+0x19a/0x204
> 20:48:01:  [<ffffffff8116000b>] submit_bio+0xc7/0xd0
> 20:48:01:  [<ffffffff810bb519>] ? bio_clone+0x39/0x44
> 20:48:01:  [<ffffffff8131be2b>] md_super_write+0xab/0xba
> 20:48:01:  [<ffffffff81322c77>] write_page+0x161/0x2c9
> 20:48:01:  [<ffffffff8102ad35>] ? dequeue_task_fair+0x201/0x210
> 20:48:01:  [<ffffffff81322eb6>] bitmap_update_sb+0xd7/0xdc
> 20:48:01:  [<ffffffff8131c210>] md_update_sb+0x1f6/0x2c9
> 20:48:01:  [<ffffffff8131dc20>] md_check_recovery+0x1b7/0x43f
> 20:48:01:  [<ffffffff8130af28>] raid10d+0x34/0x804
> 20:48:01:  [<ffffffff8103f1ce>] ? try_to_del_timer_sync+0x83/0x8f
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html