md raid10 Oops on recent kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I'm using md raid over LVM on some servers (since EVMS project has
proven to be dead), but on kernel versions 3.4 and 3.5 there is a
problem with raid10.
It can be reproduced on current Debian Wheezy (set up from scratch with
7.0beta1 installer) with kernel package v3.5 taken
from experimental repository.

Array create, initial sync (after "dd ... of=/dev/md/rtest_a") and
--assemble give no errors,
but then any directIO on md device causes oops (dd without
iflag=direct does not).
Seems strange, but V4L capture by uvcvideo driver also freezes after first oops
(and resumes only after mdadm --stop on problematic array)

Recent LVM2 has built-in RAID (implemented with md driver), but
unfortunately raid10 is not supported, so it can't replace current
setup.

Is this a bug in MD driver or in some other part of the kernel? Will it affect
other raid setups in future? (like old one with raid0 layered over raid1)


------------------------------------------------------------

Tested on a KVM guest, so hardware seems to be irrelevant.
Config: 1.5Gb memory, 2 vCPUs, 5 virtio disks


*** Short summary of commands:
vgcreate gurion_vg_jnt /dev/vdb6 /dev/vdc6 /dev/vdd6 /dev/vde6
lvcreate -n rtest_a_c1r -l 129 gurion_vg_jnt /dev/vdb6
...
lvcreate -n rtest_a_c4r -l 129 guiron_vg_jnt /dev/vde6
mdadm --create /dev/md/rtest_a --verbose --metadata=1.2 \
  --level=raid10 --raid-devices=4 --name=rtest_a \
  --chunk=1024 --bitmap=internal \
  /dev/gurion_vg_jnt/rtest_a_c1r /dev/gurion_vg_jnt/rtest_a_c2r \
  /dev/gurion_vg_jnt/rtest_a_c3r /dev/gurion_vg_jnt/rtest_a_c4r


Linux version 3.5-trunk-amd64 (Debian 3.5-1~experimental.1)
(debian-kernel@xxxxxxxxxxxxxxxx) (gcc version 4.6.3 (Debian 4.6.3-1) )
#1 SMP Thu Aug 2 17:16:27 UTC 2012

ii  linux-image-3.5-trunk-amd64                  3.5-1~experimental.1
ii  mdadm                                        3.2.5-1

(oops is captured after "mdadm --assemble /dev/md/rtest_a" and then "lvs")
----------
 BUG: unable to handle kernel paging request at ffffffff00000001
 IP: [<ffffffff00000001>] 0xffffffff00000000
 PGD 160d067 PUD 0
 Oops: 0010 [#1] SMP
 CPU 0
 Modules linked in: appletalk ipx p8023 p8022 psnap llc rose netrom
ax25 iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack iptable_filter ip_tables x_tables nfsd nfs
nfs_acl auth_rpcgss fscache lockd sunrpc loop crc32c_intel
ghash_clmulni_intel processor aesni_intel aes_x86_64 i2c_piix4
aes_generic cryptd thermal_sys button snd_pcm i2c_core snd_page_alloc
snd_timer snd soundcore psmouse pcspkr serio_raw evdev microcode
virtio_balloon ext4 crc16 jbd2 mbcache dm_mod raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor xor async_tx
raid6_pq raid1 raid0 multipath linear md_mod sr_mod cdrom ata_generic
virtio_net floppy virtio_blk ata_piix uhci_hcd ehci_hcd libata
scsi_mod virtio_pci virtio_ring virtio usbcore usb_common [last
unloaded: scsi_wait_scan]

 Pid: 11591, comm: lvs Not tainted 3.5-trunk-amd64 #1 Bochs Bochs
 RIP: 0010:[<ffffffff00000001>]  [<ffffffff00000001>] 0xffffffff00000000
 RSP: 0018:ffff88005a601a58  EFLAGS: 00010292
 RAX: 0000000000100000 RBX: ffff88005cc34c80 RCX: ffff88005d334440
 RDX: 0000000000000000 RSI: ffff88005a601a68 RDI: ffff88005b3d1c00
 RBP: 0000000000000000 R08: ffffffffa017e99c R09: 0000000000000001
 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
 R13: ffff88005cc34d00 R14: ffffea00010d7d60 R15: 0000000000000000
 FS:  00007fd8fcef77a0(0000) GS:ffff88005f200000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffff00000001 CR3: 000000005f836000 CR4: 00000000000407f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process lvs (pid: 11591, threadinfo ffff88005a600000, task ffff88005f8ae040)
 Stack:
  ffff880054ad0c80 ffffffff81126dec ffff880057065900 0000000000000400
  ffffea0000000000 0000000000000000 ffff88005a601b80 ffff8800575ded40
  ffff88005a601c20 0000000000000000 0000000000000000 ffffffff811299b5
 Call Trace:
  [<ffffffff81126dec>] ? bio_alloc+0xe/0x1e
  [<ffffffff811299b5>] ? dio_bio_add_page+0x16/0x4c
  [<ffffffff81129a51>] ? dio_send_cur_page+0x66/0xa4
  [<ffffffff8112a4dc>] ? do_blockdev_direct_IO+0x8cb/0xa81
  [<ffffffff8125ed7e>] ? kobj_lookup+0xf6/0x12e
  [<ffffffff811a13c7>] ? disk_map_sector_rcu+0x5d/0x5d
  [<ffffffff811a2d9f>] ? disk_clear_events+0x3f/0xe4
  [<ffffffff8112873a>] ? blkdev_max_block+0x2b/0x2b
  [<ffffffff81128000>] ? blkdev_direct_IO+0x4e/0x53
  [<ffffffff8112873a>] ? blkdev_max_block+0x2b/0x2b
  [<ffffffff810bbf07>] ? generic_file_aio_read+0xeb/0x5b5
  [<ffffffff811103fd>] ? dput+0x26/0xf4
  [<ffffffff81115b87>] ? mntput_no_expire+0x2a/0x134
  [<ffffffff8110b3fc>] ? do_last+0x67d/0x717
  [<ffffffff810ffe44>] ? do_sync_read+0xb4/0xec
  [<ffffffff8110051e>] ? vfs_read+0x9f/0xe6
  [<ffffffff811005aa>] ? sys_read+0x45/0x6b
  [<ffffffff81364779>] ? system_call_fastpath+0x16/0x1b
 Code:  Bad RIP value.
 RIP  [<ffffffff00000001>] 0xffffffff00000000
  RSP <ffff88005a601a58>
 CR2: ffffffff00000001
 ---[ end trace b86c49ca25a6cdb2 ]---
----------
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux