2013/3/6 Dave Cundiff <syshackmin@xxxxxxxxx> > > Hi all, > > It appears the Raid 5/10 discard support does not work in the mainline kernel. > > I've been trying to backport it to a RHEL 6 kernel without success. I > finally managed to setup a mainline dev box and discovered it doesn't > work on it either! > > I'm now testing on a stock 3.8.2 kernel. The drives I'm using are > Samsung 840 Pro's hanging off an LSI 9211-8i. No backplane and each > drive has a dedicated channel. No RAID on the LSI, its just an HBA. > > As for Raid5 that just explodes on a BUG. > > This Raid5: > mdadm -C /dev/md126 -n6 -l5 --assume-clean /dev/sda3 /dev/sdb3 > /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3 > > Outputs 2 sets of kprints > > granularity: 65535 > alignment: 42966 > max_discard_sectors: 8388607 > max_discard_sectors: 8388480 > granularity: 65535 > alignment: 42966 > max_discard_sectors: 8388607 > max_discard_sectors: 8388480 > > and then dies on a BUG > > ------------[ cut here ]------------ > kernel BUG at drivers/scsi/scsi_lib.c:1028! > invalid opcode: 0000 [#1] SMP > Modules linked in: raid456 async_raid6_recov async_pq raid6_pq > async_xor xor async_memcpy async_tx xt_REDIRECT ipt_MASQUERADE > iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 > xt_DSCP iptable_mangle iptable_filter nf_conntrack_ftp > nf_conntrack_irc xt_TCPMSS xt_owner xt_mac xt_length xt_ecn xt_LOG > xt_recent xt_limit xt_multiport xt_conntrack ipt_ULOG ipt_REJECT > ip_tables sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state > nf_conntrack ip6table_filter ip6_tables ext3 jbd dm_mod gpio_ich > iTCO_wdt iTCO_vendor_support coretemp hwmon acpi_cpufreq freq_table > mperf kvm_intel kvm microcode serio_raw pcspkr i2c_i801 lpc_ich > snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm > snd_timer snd soundcore snd_page_alloc ioatdma dca i7core_edac > edac_core sg ext4 mbcache jbd2 raid1 raid10 sd_mod crc_t10dif > crc32c_intel pata_acpi ata_generic ata_piix e1000e mpt2sas > scsi_transport_sas raid_class mgag200 ttm drm_kms_helper be2iscsi > bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio > libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi > CPU 7 > Pid: 6993, comm: md127_raid5 Not tainted 3.8.2-1.el6.x86_64 #2 > Supermicro X8DTL/X8DTL > RIP: 0010:[<ffffffff813fe5e2>] [<ffffffff813fe5e2>] scsi_init_sgtable+0x62/0x70 > RSP: 0018:ffff88032d9e5a98 EFLAGS: 00010006 > RAX: 000000000000007f RBX: ffff88062bbd0d90 RCX: ffff88032ccc1808 > RDX: ffff8805618ed080 RSI: ffffea000b202540 RDI: 0000000000000000 > RBP: ffff88032d9e5aa8 R08: 0000160000000000 R09: 000000032df23000 > R10: 000000032dc18000 R11: 0000000000000000 R12: ffff88062bbf1518 > R13: 0000000000000000 R14: 0000000000000020 R15: 000000000007f000 > FS: 0000000000000000(0000) GS:ffff88063fc60000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000002024360 CR3: 000000032ed69000 CR4: 00000000000007e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process md127_raid5 (pid: 6993, threadinfo ffff88032d9e4000, task > ffff88032c30e040) > Stack: > ffff88062bbf14c0 ffff88062bbd0d90 ffff88032d9e5af8 ffffffff813fe89d > ffff88032cdbe800 0000000000000086 ffff88032d9e5af8 ffff88062bbd0d90 > ffff88062bbf14c0 0000000000000000 ffff88032cdbe800 000000000007f000 > Call Trace: > [<ffffffff813fe89d>] scsi_init_io+0x3d/0x170 > [<ffffffff813feb44>] scsi_setup_blk_pc_cmnd+0x94/0x180 > [<ffffffffa023d1f2>] sd_setup_discard_cmnd+0x182/0x270 [sd_mod] > [<ffffffffa023d378>] sd_prep_fn+0x98/0xbd0 [sd_mod] > [<ffffffff8129ae00>] ? list_sort+0x1b0/0x3c0 > [<ffffffff8126ba1e>] blk_peek_request+0xce/0x220 > [<ffffffff813fddd0>] scsi_request_fn+0x60/0x540 > [<ffffffff8126a5e7>] __blk_run_queue+0x37/0x50 > [<ffffffff8126abae>] queue_unplugged+0x4e/0xb0 > [<ffffffff8126bcf6>] blk_flush_plug_list+0x156/0x230 > [<ffffffff8126bde8>] blk_finish_plug+0x18/0x50 > [<ffffffffa067b602>] raid5d+0x282/0x2a0 [raid456] > [<ffffffff8149d1f7>] md_thread+0x117/0x150 > [<ffffffff8107bfd0>] ? wake_up_bit+0x40/0x40 > [<ffffffff8149d0e0>] ? md_rdev_init+0x110/0x110 > [<ffffffff8107b73e>] kthread+0xce/0xe0 > [<ffffffff8107b670>] ? kthread_freezable_should_stop+0x70/0x70 > [<ffffffff815dbeec>] ret_from_fork+0x7c/0xb0 > [<ffffffff8107b670>] ? kthread_freezable_should_stop+0x70/0x70 > Code: 49 8b 14 24 e8 f0 31 e7 ff 41 3b 44 24 08 77 1b 41 89 44 24 08 > 8b 43 54 41 89 44 24 10 31 c0 5b 41 5c c9 c3 b8 02 00 00 00 eb f4 <0f> > 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 > RIP [<ffffffff813fe5e2>] scsi_init_sgtable+0x62/0x70 > RSP <ffff88032d9e5a98> > ---[ end trace 5aea2a41495b91fc ]--- > Kernel panic - not syncing: Fatal exception > > That BUG is in > > /* > * Next, walk the list, and fill in the addresses and sizes of > * each segment. > */ > count = blk_rq_map_sg(req->q, req, sdb->table.sgl); > BUG_ON(count > sdb->table.nents); > sdb->table.nents = count; > sdb->length = blk_rq_bytes(req); > return BLKPREP_OK; > > WAAAY over my head. > > So at this point I'm unsure how to continue. My total time in kernel > code numbers in hours(maybe days). :) > > My Backport to RHEL works if I increase the chunk size to 65536 as > well. I could go with that but I'm fairly certain such huge chunks may > cause an IO issue even on a crazy fast SSD array. > Hi Dave and all, May I ask about the status of this problem? I am trying to backport discard support to kernel 3.4 but have almost the same kernel bug and error message. Thanks a lot. Regards, Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html