Re: kernel BUG at drivers/scsi/scsi_lib.c:1101! observed during md5sum for one file on (RAID4->RAID0) device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2015-07-30 at 05:03 -0400, Yi Zhang wrote:
> Hi SCSI/RAID maintainer
> 
> During raid test with 4.2.0-rc3, I observed below kernel BUG, pls check below info for the test log/environment/test steps.
> 
> Log:
> [  306.741662] md: bind<sdb1>
> [  306.750865] md: bind<sdc1>
> [  306.753993] md: bind<sdd1>
> [  306.764475] md: bind<sde1>
> [  306.786156] md: bind<sdf1>
> [  306.789362] md: bind<sdh1>
> [  306.792555] md: bind<sdg1>
> [  306.868166] raid6: sse2x1   gen() 10589 MB/s
> [  306.889143] raid6: sse2x1   xor()  8218 MB/s
> [  306.910121] raid6: sse2x2   gen() 13453 MB/s
> [  306.931102] raid6: sse2x2   xor()  8990 MB/s
> [  306.952079] raid6: sse2x4   gen() 15539 MB/s
> [  306.973063] raid6: sse2x4   xor() 10771 MB/s
> [  306.994039] raid6: avx2x1   gen() 20582 MB/s
> [  307.015017] raid6: avx2x2   gen() 24019 MB/s
> [  307.035998] raid6: avx2x4   gen() 27824 MB/s
> [  307.040755] raid6: using algorithm avx2x4 gen() 27824 MB/s
> [  307.046869] raid6: using avx2x2 recovery algorithm
> [  307.058793] async_tx: api initialized (async)
> [  307.075428] xor: automatically using best checksumming function:
> [  307.091942]    avx       : 32008.000 MB/sec
> [  307.147662] md: raid6 personality registered for level 6
> [  307.153584] md: raid5 personality registered for level 5
> [  307.159505] md: raid4 personality registered for level 4
> [  307.165698] md/raid:md0: device sdf1 operational as raid disk 4
> [  307.172300] md/raid:md0: device sde1 operational as raid disk 3
> [  307.178899] md/raid:md0: device sdd1 operational as raid disk 2
> [  307.185497] md/raid:md0: device sdc1 operational as raid disk 1
> [  307.192093] md/raid:md0: device sdb1 operational as raid disk 0
> [  307.199052] md/raid:md0: allocated 6482kB
> [  307.203573] md/raid:md0: raid level 4 active with 5 out of 6 devices, algorithm 0
> [  307.211958] md0: detected capacity change from 0 to 53645148160
> [  307.218658] md: recovery of RAID array md0
> [  307.223226] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [  307.229729] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> [  307.240427] md: using 128k window, over a total of 10477568k.
> [  374.670951] md: md0: recovery done.
> [  375.722806] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
> [  447.553364] md: unbind<sdh1>
> [  447.559905] md: export_rdev(sdh1)
> [  447.572684] md: cannot remove active disk sdg1 from md0 ...
> [  447.578909] md/raid:md0: Disk failure on sdg1, disabling device.
> [  447.578909] md/raid:md0: Operation continuing on 5 devices.
> [  447.594850] md: unbind<sdg1>
> [  447.601834] md: export_rdev(sdg1)
> [  447.615446] md: raid0 personality registered for level 0
> [  447.629275] md/raid0:md0: md_size is 104775680 sectors.
> [  447.635094] md: RAID0 configuration for md0 - 1 zone
> [  447.640627] md: zone0=[sdb1/sdc1/sdd1/sde1/sdf1]
> [  447.645833]       zone-offset=         0KB, device-offset=         0KB, size=  52387840KB
> [  447.654949] 
> [  447.739443] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
> [  447.749258] bio too big device sde1 (768 > 512)

This is the actual error.  It looks like an md problem (md list copied).

> [  447.754824] bio too big device sdf1 (1024 > 512)
> [  447.759989] bio too big device sdb1 (768 > 512)
> [  447.771102] bio too big device sdc1 (1024 > 512)
> [  447.776276] bio too big device sdd1 (1024 > 512)
> [  447.781459] bio too big device sde1 (1024 > 512)
> [  447.786635] bio too big device sdf1 (768 > 512)
> [  447.811156] bio too big device sdb1 (1024 > 512)
> [  447.816329] bio too big device sdc1 (1024 > 512)
> [  447.821513] bio too big device sdd1 (1024 > 512)
> [  447.826681] bio too big device sde1 (768 > 512)
> [  447.886106] bio too big device sdf1 (1024 > 512)
> [  447.891269] bio too big device sdb1 (1024 > 512)
> [  447.896452] bio too big device sdc1 (1024 > 512)
> [  447.901628] bio too big device sdd1 (768 > 512)
> [  447.930647] bio too big device sde1 (1024 > 512)
> [  447.935820] bio too big device sdf1 (1024 > 512)
> [  447.941003] bio too big device sdb1 (1024 > 512)
> [  447.946179] bio too big device sdc1 (768 > 512)
> [  447.976196] bio too big device sdd1 (1024 > 512)
> [  447.981367] bio too big device sde1 (1024 > 512)
> [  447.986549] bio too big device sdf1 (1024 > 512)
> [  447.991728] bio too big device sdb1 (768 > 512)
> [  448.033614] bio too big device sdc1 (1024 > 512)
> [  448.038786] bio too big device sdd1 (1024 > 512)
> [  448.043968] bio too big device sde1 (1024 > 512)
> [  448.049145] bio too big device sdf1 (768 > 512)
> [  448.083273] bio too big device sdb1 (1024 > 512)
> [  448.088444] bio too big device sdc1 (1024 > 512)
> [  448.093626] bio too big device sdd1 (1024 > 512)
> [  448.098804] bio too big device sde1 (768 > 512)
> [  448.128357] bio too big device sdf1 (1024 > 512)
> [  448.133536] bio too big device sdb1 (1024 > 512)
> [  448.138720] bio too big device sdc1 (1024 > 512)
> [  448.143897] bio too big device sdd1 (768 > 512)
> [  448.173456] bio too big device sde1 (1024 > 512)
> [  448.178627] bio too big device sdf1 (1024 > 512)
> [  448.183811] bio too big device sdb1 (1024 > 512)
> [  448.188985] bio too big device sdc1 (768 > 512)
> [  448.231050] bio too big device sdd1 (1024 > 512)
> [  448.236221] bio too big device sde1 (1024 > 512)
> [  448.241405] bio too big device sdf1 (1024 > 512)
> [  448.246583] bio too big device sdb1 (768 > 512)
> [  448.282548] bio too big device sdc1 (1024 > 512)
> [  448.287719] bio too big device sdd1 (1024 > 512)
> [  448.292904] bio too big device sde1 (1024 > 512)
> [  448.298082] bio too big device sdf1 (768 > 512)
> [  448.328300] bio too big device sdb1 (1024 > 512)
> [  448.333471] bio too big device sdc1 (1024 > 512)
> [  448.338654] bio too big device sdd1 (1024 > 512)
> [  448.343830] bio too big device sde1 (768 > 512)
> [  448.374081] bio too big device sdf1 (1024 > 512)
> [  448.379250] bio too big device sdb1 (1024 > 512)
> [  448.384433] bio too big device sdc1 (1024 > 512)
> [  448.389609] bio too big device sdd1 (768 > 512)
> [  448.394690] ------------[ cut here ]------------
> [  448.399832] kernel BUG at drivers/scsi/scsi_lib.c:1095!

This bug on is here:

        BUG_ON(count > sdb->table.nents);

It's merely enforcing with a BUG_ON what the warning was complaining
about.

James

> [  448.405653] invalid opcode: 0000 [#1] SMP 
> [  448.410232] Modules linked in: raid0 ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor asyd
> [  448.491371] CPU: 1 PID: 11918 Comm: md5sum Not tainted 4.2.0-rc3 #2
> [  448.498354] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.2.10 03/09/2015
> [  448.506791] task: ffff880461f28000 ti: ffff880462e08000 task.ti: ffff880462e08000
> [  448.515130] RIP: 0010:[<ffffffff8146aaf2>]  [<ffffffff8146aaf2>] scsi_init_sgtable+0x72/0x80
> [  448.524548] RSP: 0018:ffff880462e0b8f8  EFLAGS: 00010002
> [  448.530465] RAX: 0000000000000003 RBX: ffff8803fc03f980 RCX: 0000000000001000
> [  448.538417] RDX: 0000000000000000 RSI: ffff8803fbb78040 RDI: 0000000000000000
> [  448.546369] RBP: ffff880462e0b918 R08: ffff8803fbb78040 R09: 0000000000000000
> [  448.554320] R10: 00000000000001f0 R11: ffffea000feede00 R12: ffff8803fba3b860
> [  448.562272] R13: 0000000000000000 R14: ffff880461edc000 R15: ffff8803fc03f980
> [  448.570224] FS:  00007f41ce7cc740(0000) GS:ffff88046d240000(0000) knlGS:0000000000000000
> [  448.579242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  448.585644] CR2: 0000000000e0226f CR3: 0000000467aa3000 CR4: 00000000001406e0
> [  448.593597] Stack:
> [  448.595834]  ffff880462e0b918 ffff8803fba3b780 ffff88046072a200 ffff880461edc000
> [  448.604113]  ffff880462e0b968 ffffffff8146ab4a ffff88046072aaf8 ffff8803fbb78000
> [  448.612392]  ffff8803fbb78000 ffff8803fc03f980 ffff88046072a260 ffff88046064ec00
> [  448.620669] Call Trace:
> [  448.623393]  [<ffffffff8146ab4a>] scsi_init_io+0x4a/0x1c0
> [  448.629410]  [<ffffffffa004ed67>] sd_setup_read_write_cmnd+0x47/0xa40 [sd_mod]
> [  448.637460]  [<ffffffff81462a8b>] ? scsi_host_alloc_command+0x4b/0xc0
> [  448.644638]  [<ffffffffa00527d7>] sd_init_command+0x27/0xa0 [sd_mod]
> [  448.651720]  [<ffffffff8146adb1>] scsi_setup_cmnd+0xf1/0x160
> [  448.658026]  [<ffffffff8146af71>] scsi_prep_fn+0xd1/0x170
> [  448.664042]  [<ffffffff81309dac>] ? deadline_dispatch_requests+0xac/0x160
> [  448.671609]  [<ffffffff812ed683>] blk_peek_request+0x153/0x260
> [  448.678110]  [<ffffffff8146ca7f>] scsi_request_fn+0x3f/0x610
> [  448.684416]  [<ffffffff812e8c57>] __blk_run_queue+0x37/0x50
> [  448.690626]  [<ffffffff812e8cee>] queue_unplugged+0x2e/0xa0
> [  448.696836]  [<ffffffff812eda65>] blk_flush_plug_list+0x1b5/0x200
> [  448.703626]  [<ffffffff812ede14>] blk_finish_plug+0x34/0x50
> [  448.709836]  [<ffffffff8118cdfd>] __do_page_cache_readahead+0x1cd/0x240
> [  448.717207]  [<ffffffff8118cfb5>] ondemand_readahead+0x145/0x270
> [  448.723903]  [<ffffffff812209ba>] ? inode_congested+0xaa/0x110
> [  448.730402]  [<ffffffff8118d14c>] page_cache_async_readahead+0x6c/0x70
> [  448.737677]  [<ffffffff811811d3>] generic_file_read_iter+0x3c3/0x5e0
> [  448.744760]  [<ffffffff811f7569>] __vfs_read+0xc9/0x100
> [  448.750582]  [<ffffffff811f7b86>] vfs_read+0x86/0x130
> [  448.756211]  [<ffffffff811f8a15>] SyS_read+0x55/0xc0
> [  448.761742]  [<ffffffff81681b2e>] entry_SYSCALL_64_fastpath+0x12/0x71
> [  448.768920] Code: ff 41 3b 44 24 08 77 23 41 89 44 24 08 8b 43 5c 41 89 44 24 10 48 83 c4 08 44 89 e8 5b 41 5c 41 5d 5d  
> [  448.790490] RIP  [<ffffffff8146aaf2>] scsi_init_sgtable+0x72/0x80
> [  448.797287]  RSP <ffff880462e0b8f8>
> [  448.801171] ---[ end trace fa7203c8f83678c8 ]---
> [  448.853171] Kernel panic - not syncing: Fatal exception
> [  448.859020] Kernel Offset: disabled
> [  448.862904] drm_kms_helper: panic occurred, switching back to text console
> [  448.920805] ---[ end Kernel panic - not syncing: Fatal exception
> [  448.927513] ------------[ cut here ]------------
> [  448.932661] WARNING: CPU: 1 PID: 11918 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5d/0x60()
> [  448.943423] Modules linked in: raid0 ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor asyd
> [  449.024578] CPU: 1 PID: 11918 Comm: md5sum Tainted: G      D         4.2.0-rc3 #2
> [  449.032918] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.2.10 03/09/2015
> [  449.041353]  0000000000000000 00000000429195bb ffff88046d243d68 ffffffff8167acdd
> [  449.049635]  0000000000000000 0000000000000000 ffff88046d243da8 ffffffff81081a4a
> [  449.057917]  ffff88046d243da8 0000000000000000 ffff88046d216780 0000000000000001
> [  449.066198] Call Trace:
> [  449.068918]  <IRQ>  [<ffffffff8167acdd>] dump_stack+0x45/0x57
> [  449.075336]  [<ffffffff81081a4a>] warn_slowpath_common+0x8a/0xc0
> [  449.082030]  [<ffffffff81081b7a>] warn_slowpath_null+0x1a/0x20
> [  449.088530]  [<ffffffff8104d56d>] native_smp_send_reschedule+0x5d/0x60
> [  449.095805]  [<ffffffff810be8e5>] trigger_load_balance+0x145/0x1f0
> [  449.102693]  [<ffffffff810ad486>] scheduler_tick+0xa6/0xe0
> [  449.108807]  [<ffffffff810f9bb0>] ? tick_sched_do_timer+0x50/0x50
> [  449.115599]  [<ffffffff810ea651>] update_process_times+0x51/0x60
> [  449.122293]  [<ffffffff810f9965>] tick_sched_handle.isra.17+0x25/0x60
> [  449.129471]  [<ffffffff810f9bf4>] tick_sched_timer+0x44/0x80
> [  449.135779]  [<ffffffff810eb1e3>] __hrtimer_run_queues+0xf3/0x220
> [  449.142570]  [<ffffffff810eb648>] hrtimer_interrupt+0xa8/0x1a0
> [  449.149069]  [<ffffffff810500b9>] local_apic_timer_interrupt+0x39/0x60
> [  449.156345]  [<ffffffff81684835>] smp_apic_timer_interrupt+0x45/0x60
> [  449.163427]  [<ffffffff816829cb>] apic_timer_interrupt+0x6b/0x70
> [  449.170118]  <EOI>  [<ffffffff816755e3>] ? panic+0x1cc/0x20d
> [  449.176435]  [<ffffffff816755dc>] ? panic+0x1c5/0x20d
> [  449.182065]  [<ffffffff81019428>] oops_end+0xc8/0xe0
> [  449.187595]  [<ffffffff8101994b>] die+0x4b/0x70
> [  449.192643]  [<ffffffff81015e6d>] do_trap+0x13d/0x150
> [  449.198272]  [<ffffffff81016338>] do_error_trap+0xa8/0x170
> [  449.204386]  [<ffffffff8146aaf2>] ? scsi_init_sgtable+0x72/0x80
> [  449.210983]  [<ffffffff811821b5>] ? mempool_alloc_slab+0x15/0x20
> [  449.217675]  [<ffffffff811822f9>] ? mempool_alloc+0x69/0x170
> [  449.223980]  [<ffffffff81016850>] do_invalid_op+0x20/0x30
> [  449.229996]  [<ffffffff8168348e>] invalid_op+0x1e/0x30
> [  449.235721]  [<ffffffff8146aaf2>] ? scsi_init_sgtable+0x72/0x80
> [  449.242317]  [<ffffffff8146aac8>] ? scsi_init_sgtable+0x48/0x80
> [  449.248912]  [<ffffffff8146ab4a>] scsi_init_io+0x4a/0x1c0
> [  449.254930]  [<ffffffffa004ed67>] sd_setup_read_write_cmnd+0x47/0xa40 [sd_mod]
> [  449.262979]  [<ffffffff81462a8b>] ? scsi_host_alloc_command+0x4b/0xc0
> [  449.270157]  [<ffffffffa00527d7>] sd_init_command+0x27/0xa0 [sd_mod]
> [  449.277239]  [<ffffffff8146adb1>] scsi_setup_cmnd+0xf1/0x160
> [  449.283544]  [<ffffffff8146af71>] scsi_prep_fn+0xd1/0x170
> [  449.289561]  [<ffffffff81309dac>] ? deadline_dispatch_requests+0xac/0x160
> [  449.297128]  [<ffffffff812ed683>] blk_peek_request+0x153/0x260
> [  449.303628]  [<ffffffff8146ca7f>] scsi_request_fn+0x3f/0x610
> [  449.309933]  [<ffffffff812e8c57>] __blk_run_queue+0x37/0x50
> [  449.316142]  [<ffffffff812e8cee>] queue_unplugged+0x2e/0xa0
> [  449.322351]  [<ffffffff812eda65>] blk_flush_plug_list+0x1b5/0x200
> [  449.329142]  [<ffffffff812ede14>] blk_finish_plug+0x34/0x50
> [  449.335351]  [<ffffffff8118cdfd>] __do_page_cache_readahead+0x1cd/0x240
> [  449.342722]  [<ffffffff8118cfb5>] ondemand_readahead+0x145/0x270
> [  449.349416]  [<ffffffff812209ba>] ? inode_congested+0xaa/0x110
> [  449.355916]  [<ffffffff8118d14c>] page_cache_async_readahead+0x6c/0x70
> [  449.363190]  [<ffffffff811811d3>] generic_file_read_iter+0x3c3/0x5e0
> [  449.370273]  [<ffffffff811f7569>] __vfs_read+0xc9/0x100
> [  449.376094]  [<ffffffff811f7b86>] vfs_read+0x86/0x130
> [  449.381723]  [<ffffffff811f8a15>] SyS_read+0x55/0xc0
> [  449.387254]  [<ffffffff81681b2e>] entry_SYSCALL_64_fastpath+0x12/0x71
> [  449.394432] ---[ end trace fa7203c8f83678c9 ]---
> 
> 
> Environment: 4.2.0-rc3
> [root@storageqe-09 ~]# lsblk 
> NAME                        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sdb                           8:16   0 931.5G  0 disk 
> └─sdb1                        8:17   0    10G  0 part 
> sdc                           8:32   0 931.5G  0 disk 
> └─sdc1                        8:33   0    10G  0 part 
> sdd                           8:48   0 931.5G  0 disk 
> └─sdd1                        8:49   0    10G  0 part 
> sde                           8:64   0 931.5G  0 disk 
> └─sde1                        8:65   0    10G  0 part 
> sdf                           8:80   0 931.5G  0 disk 
> └─sdf1                        8:81   0    10G  0 part 
> sdg                           8:96   0   3.7T  0 disk 
> └─sdg1                        8:97   0    10G  0 part 
> sdh                           8:112  0   3.7T  0 disk 
> └─sdh1                        8:113  0    10G  0 part
>  
> Reproduce-steps:
> While [ 1 ]
> do
> mdadm --create --run /dev/md0 --level 4 --metadata 1.2 --raid-devices 6 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 --spare-devices 1 /dev/sdh1 --chunk 512
> mdadm --wait /dev/md0
> mkfs -t ext4 /dev/md0
> mkdir /mnt/md_test
> mount /dev/md0 /mnt/md_test
> dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
> md5sum /mnt/md_test/testfile > md5.old
> umount /dev/md0
> mdadm --grow -l0 /dev/md0  --backup-file=tmp0
> mdadm --wait /dev/md0
> mount /dev/md0 /mnt/md_test
> md5sum /mnt/md_test/testfile >md5.new                // kernel BUG at drivers/scsi/scsi_lib.c:1101!
> umount /dev/md0
> mdadm -Ss
> mdadm --zero-superblock /dev/sd[bcdefgh]1
> done
> 
> 
> Best Regards,
>  Yi Zhang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux