Re: I/O errors without erros from underlying device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>>>> "Arkadiusz" == Arkadiusz Miśkiewicz <arekm@xxxxxxxx> writes:

Arkadiusz> On Monday 07 of December 2015, John Stoffel wrote:
Arkadiusz> 4.3.0 kernel, raid6 array:
>> 
>> I think there's a bug in the 4.3.x and 4.4-rc3 and lower with block
>> merges.  I ran into these over the weekend, where v4.2.6 was stable,
>> but anything higher would lock up and crash on me.

Arkadiusz> Well, no crashes here.

That's good.  It was hard(er) to hit when I wasn't running KVM VMs at
the same time on the server, and I was running strictly RAID1 disks,
so it's hard to know.

>> So first step would be to make sure you get and test v4.4-rc4.

Arkadiusz> Do you know which commit there?

Try this, from the master lkml git repository:

    2873d32ff493ecbfb7d2c7f56812ab941dda42f4




>> 
Arkadiusz> md7 : active raid6 sdg[10] sdad1[9] sdac1[8] sdag1[7] sdaf1[6]
>> sdae1[5] sdaj1[4] sdai1[3] sdah1[2] sdn1[1] Arkadiusz>       31255089152
>> blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
Arkadiusz> bitmap: 1/30 pages [4KB], 65536KB chunk
>> 
Arkadiusz> array had weird failure where many disks went into failed state
>> but Arkadiusz> remove && adding these disks "fixed" it (turns out not
>> really fixed it).
>> 
Arkadiusz> Unfortunately now some reads fail:
>> 
Arkadiusz> pread(4, 0x1483a00, 4096, 16003680464896) = -1 EIO (Input/output
>> error)
>> 
Arkadiusz> To reproduce used xfs_io
Arkadiusz> xfs_io -d -c "pread 16003680464896 4096" /dev/md7
Arkadiusz> pread64: Input/output error
Arkadiusz> which does pread exactly as shown above.
>> 
Arkadiusz> write also fails for that area:
Arkadiusz> xfs_io -d -c "pwrite 16003680464896 4096" /dev/md7
Arkadiusz> pwrite64: Input/output error
>> 
Arkadiusz> Note that nothing is written in dmesg when that happens.
>> 
Arkadiusz> I've tried various offsets and sizes of pread and at some point
>> that was logged: Arkadiusz> [  848.988518] Buffer I/O error on dev md7,
>> logical block 3907148544, async page read
>> 
Arkadiusz> but no error from underlying devices.
>> 
Arkadiusz> List of bad blocks:
Arkadiusz> http://sprunge.us/XSWI
>> 
Arkadiusz> What can I do now?
>> 
Arkadiusz> (loosing data from that few sectors is acceptable if the rest
>> will be readable)
>> 
Arkadiusz> Thanks,
Arkadiusz> --
Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
Arkadiusz> --
Arkadiusz> To unsubscribe from this list: send the line "unsubscribe
>> linux-raid" in Arkadiusz> the body of a message to
>> majordomo@xxxxxxxxxxxxxxx
Arkadiusz> More majordomo info at 
>> http://vger.kernel.org/majordomo-info.html


Arkadiusz> -- 
Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux