reiserfsck Segfaulting om md readonly raid6 array, dmesg shows "kernel BUG at drivers/md/md.c:5790"

Durval Menezes <durval.menezes@xxxxxxxxx> · Mon, 1 Apr 2013 17:26:22 -0300

Hello folks,

First a little background: I'm in the process of recovering a 5-disk RAID6
array where 3 devices failed :-/ What happened is that one device died,
then we inserted a new device and during rebuild two others were kicked
from the array, separated by a few minutes, due to them having bad sectors
too and taking too long to return failure to md (TLER was not set). This
was on a EL4-based system running kernel 2.6.27.

I've rebooted from a recovery CD (gentoo mini with kernel  2.6.29), then
managed to reassemble the array with the two intact disks and one of the
kicked-out ones. I then set it to readonly (md --readonly   /dev/md0) for
safety while checking everything out, and then checked it with vgscan,
which found all three LVM volumes (good sign, and IMO demonstrates that my
data could have survived). Then I set those volumes active (with vgchange
-a y) and tried to run "reiserfsck --check" on the first of them, with the
following result:

     reiserfsck --check /dev/VolGroup00/Main
         [...]
         Replaying journal..
         Trans replayed: mountid 47, transid 11403219, desc 197, len 1, commit 199, next trans offset 182
         Segmentation fault

I then checked dmesg and got the "kernel BUG at drivers/md/md.c" message
block copied below.

I wonder whether this is related to the fsync bug on md0 arrays recently
reported here on the list (it makes sense for reiserfsck to call fsync
after each critical recovery point, even though not much sense if the
filesystem is in read-only mode... but anyway IMHO the request should have
been just ignored).

Also, what would you suggest in order to recover from this? Should I just
reset the array to readwrite mode and hope for the best? Hope I don't need
a new kernel for recovery, because it will not be viable to upgrade to a
more recent kernel, nor change from reiserfs to something else in the
middle of this (specially in the middle recovering my data).

Thanks in advance,
-- 
   Durval.

------------[ cut here ]------------
kernel BUG at drivers/md/md.c:5790!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/block/sda/sda2/uevent
Modules linked in: video backlight output ac battery button fan thermal
processor thermal_sys e100 e1000e rtc tg3 libphy e1000 fuse jfs raid10
raid456 async_memcpy async_xor
 xor async_tx raid1 raid0 dm_bbr dm_snapshot dm_mirror dm_region_hash
dm_log dm_mod scsi_wait_scan sbp2 ohci1394 ieee1394 sl811_hcd usbhid
ohci_hcd uhci_hcd usb_storage ehci
_hcd usbcore lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid
aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc
scsi_transport_fc scsi_tgt
 mptspi mptscsih mptbase atp870u dc395x sim710 53c700 qla1280 dmx3191d
sym53c8xx qlogicfas408 gdth aha1740 advansys initio BusLogic arcmsr aic7xxx
aic79xx scsi_transport_spi
 sg pdc_adma sata_inic162x sata_mv ata_piix ahci sata_qstor sata_vsc
sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil
sata_promise pata_sl82c105 pata
_cs5535 pata_cs5530 pata_cs5520 pata_via pata_jmicron pata_marvell pata_sis
pata_netcell pata_sc1200 pata_pdc202xx_old pata_triflex pata_atiixp
pata_opti pata_amd pata_ali p
ata_it8213 pata_isapnp pata_pcmcia pcmcia firmware_class pcmcia_core
pata_ns87415 pata_ns87410 pata_serverworks pata_artop pata_it821x
pata_optidma pata_hpt3x2n pata_hpt3x3
pata_hpt37x pata_hpt366 pata_cmd64x pata_efar pata_rz1000 pata_sil680
pata_radisys pata_pdc2027x pata_mpiix libata

Pid: 23506, comm: reiserfsck Not tainted (2.6.29-gentoo-r5 #1) S3210SH
EIP: 0060:[<c03739b0>] EFLAGS: 00010246 CPU: 0
EIP is at md_write_start+0x1b/0x13c
EAX: 00000001 EBX: f6e9b800 ECX: f3a72240 EDX: f3a72240
ESI: 0138ea88 EDI: 0138ea88 EBP: f3a72240 ESP: f3cadcfc
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process reiserfsck (pid: 23506, ti=f3cac000 task=f6d92280 task.ti=f3cac000)
Stack:
 00000000 00001000 c0174bf2 f578ccc0 f578ccc0 f67d8068 0138ea88 0138ea88
 f3a72240 f805c8ef f5a0d48c 00001000 f69e5384 f3a72240 f6e9b800 f6134c80
 f7e79080 c1675da0 00000000 00271d21 00000001 00000000 00000000 0138e908
Call Trace:
 [<c0174bf2>] set_bh_page+0x4e/0x56
 [<f805c8ef>] make_request+0x48/0x5fd [raid456]
 [<c02c0456>] generic_make_request+0x28a/0x2cd
 [<c0179f4a>] blkdev_write_end+0x30/0x38
 [<c013fd88>] mempool_alloc+0x27/0xcb
 [<c02c0526>] submit_bio+0x8d/0x95
 [<c0177f03>] bio_alloc_bioset+0x1e/0xf2
 [<c017498e>] submit_bh+0xc7/0xe3
 [<c0176e61>] __block_write_full_page+0x20c/0x2e1
 [<c0178d3e>] blkdev_get_block+0x0/0xc0
 [<c0176ff7>] block_write_full_page+0xc1/0xca
 [<c0178d3e>] blkdev_get_block+0x0/0xc0
 [<c0142c1c>] __writepage+0x8/0x22
 [<c0143233>] write_cache_pages+0x1ae/0x29e
 [<c0142c14>] __writepage+0x0/0x22
 [<c013f875>] generic_file_aio_write_nolock+0x3b/0x84
 [<c0143323>] generic_writepages+0x0/0x21
 [<c014333d>] generic_writepages+0x1a/0x21
 [<c0143364>] do_writepages+0x20/0x30
 [<c013e894>] __filemap_fdatawrite_range+0x54/0x60
 [<c013f76e>] filemap_fdatawrite+0x12/0x16
 [<c0173b15>] vfs_fsync+0x40/0x85
 [<c0173b79>] do_fsync+0x1f/0x2e
 [<c0102c42>] syscall_call+0x7/0xb
Code: f0 80 8a 34 01 00 00 20 83 c4 1c 5b 5e 5f 5d c3 55 57 56 53 89 c3 83
ec 14 f6 42 14 01 0f 84 21 01 00 00 8b 40 1c 83 f8 01 75 04 <0f> 0b eb fe
31 ff 83 f8 02 75 30 c7 43 1c 00 00 00 00 8d 83 34
EIP: [<c03739b0>] md_write_start+0x1b/0x13c SS:ESP 0068:f3cadcfc
---[ end trace 6d3a980df51f2517 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html