Re: Re[2]: RAID1 submirror failure causes reboot?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday November 13, jens.axboe@xxxxxxxxxx wrote:
> 
> It doesn't sound at all unreasonable. It's most likely either a bug in
> the ide driver, or a "bad" bio being passed to the block layer (and
> later on to the request and driver). By "bad" I mean one that isn't
> entirely consistent, which could be a bug in eg md. 

I just noticed (while tracking raid6 problems...) that bio_clone calls
bio_phys_segments and bio_hw_segments (why does it do both?).
This calls blk_recount_segments which does calculations based on
     ->bi_bdev.
Only immediately after calling bio_clone, raid1 changes bi_bdev, thus
creating potential inconsistency in the bio.  Would this sort of
inconsistency cause this problem?

> 
> Agree, that would be a good plan to enable. Other questions: are you
> seeing timeouts at any point? The ide timeout code has some request/bio
> "resetting" code which might be worrisome.

Jim could probably answer this with more authority, but there aren't
obvious timeouts from the logs he posted.  A representative sample is:
[87338.675891] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
[87338.685143] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711
[87338.694791] ide: failed opcode was: unknown
[87343.557424] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
[87343.566388] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711
[87343.576105] ide: failed opcode was: unknown
[87348.472226] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
[87348.481170] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711
[87348.490843] ide: failed opcode was: unknown
[87353.387028] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
[87353.395735] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711
[87353.405500] ide: failed opcode was: unknown
[87353.461342] ide1: reset: success

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux