On Friday November 10, klimov@xxxxxxxxxxx wrote: > Hello Linux RAID, > > One of our servers using per-partition mirroring has a > frequently-failing partition, hdc11 below. > > When it is dubbed failing, the server usually crashes > with a stacktrace like below. This seems strange, because > the other submirror, hda11 is alive and well, and this > should all be transparent thru the RAID layer? This is > what it's for? > > After the reboot I usually succeed in hot-adding hdc11 > back to the mirror, although several times it was not > marked dead at all and rebuilt by itself after reboot. > Also seems rather incorrect: if it died, it should be > marked so (perhaps in metadata on a live mirror)? > > Overall, uncool (although mirroring has saved us many > times, thanks!) > --snip-- > [87392.564004] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } > [87392.572790] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315718 > [87392.582454] ide: failed opcode was: unknown > [87392.635961] ide1: reset: success > [87397.528687] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } > [87397.537607] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315718 > [87397.547335] ide: failed opcode was: unknown > [87397.551897] end_request: I/O error, dev hdc, sector 176315718 > [87398.520820] raid1: Disk failure on hdc11, disabling device. > [87398.520826] Operation continuing on 1 devices > [87398.531579] blk: request botched ^^^^^^^^^^^^^^^^^^^^ That looks bad. Possible some bug in the IDE controller or elsewhere in the block layer. Jens: What might cause that? --snip-- > [87403.678603] Call Trace: > [87403.681462] [<c0103bba>] show_stack_log_lvl+0x8d/0xaa > [87403.686911] [<c0103ddc>] show_registers+0x1b0/0x221 > [87403.692306] [<c0103ffc>] die+0x124/0x1ee > [87403.696558] [<c0104165>] do_trap+0x9f/0xa1 > [87403.700988] [<c0104427>] do_invalid_op+0xa7/0xb1 > [87403.706012] [<c0103871>] error_code+0x39/0x40 > [87403.710794] [<c0180e0a>] mpage_end_io_read+0x5e/0x72 > [87403.716154] [<c0164af9>] bio_endio+0x56/0x7b > [87403.720798] [<c0256778>] __end_that_request_first+0x1e0/0x301 > [87403.726985] [<c02568a4>] end_that_request_first+0xb/0xd > [87403.732699] [<c02bd73c>] __ide_end_request+0x54/0xe1 > [87403.738214] [<c02bd807>] ide_end_request+0x3e/0x5c > [87403.743382] [<c02c35df>] task_error+0x5b/0x97 > [87403.748113] [<c02c36fa>] task_in_intr+0x6e/0xa2 > [87403.753120] [<c02bf19e>] ide_intr+0xaf/0x12c > [87403.757815] [<c013e5a7>] handle_IRQ_event+0x23/0x57 > [87403.763135] [<c013e66f>] __do_IRQ+0x94/0xfd > [87403.767802] [<c0105192>] do_IRQ+0x32/0x68 That doesn't look like raid was involved. If it was you would expect to see raid1_end_write_request or raid1_end_read_request in that trace. Do you have any other partitions of hdc in use but not on raid? Which partition is sector 176315718 in ?? NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html