Re: RAID6: "Bad block number requested"

Bryan Gurney <bgurney@xxxxxxxxxx> · Tue, 12 Jun 2018 08:53:02 -0400

On Mon, Jun 11, 2018 at 6:09 PM, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 2018-06-11 at 17:56 -0400, Bryan Gurney wrote:
>> On Mon, Jun 11, 2018 at 1:00 PM, Anthony Youngman
>> <anthony@xxxxxxxxxxxxxxx> wrote:
>> > On 11/06/18 16:06, James Bottomley wrote:
>> > > Well, this is the problem: a 4k logical (presumably 4k physical)
>> > > drive cannot be addressed in block sectors that are not divisible
>> > > by 8.  This type of drive configuration is very unusual (although
>> > > it was something we tested years ago before the industry realised
>> > > it had to ship drives with 4k physical but 512 byte logical
>> > > sectors because of the legacy problem).
>> >
>> > I understood these drives were now becoming much more common,
>> > especially enterprise-grade drives. I know there were problems
>> > switching from 512/512 drives to 512/4096, but as you say I thought
>> > they were pretty much addressed.
>>
>> As soon as I saw the model number "HGST HUH721010AL", and did a
>> search, I said, "Oh, it's _this_ drive."
>>
>> The HGST Ultrastar He10 has both "512e Format" and "4K Native Format"
>> part numbers, so it's easy to potentially buy the wrong type of drive
>> (e.g.: accidentally buy a 4K Native drive, and discover some obscure
>> I/O failures).
>>
>> FYI, in my experience, when an application sends a
>> smaller-than-4096-bytes I/O to a 4096-bytes block device, the usual
>> error code that's sent by the driver is EINVAL (or "Invalid
>> argument"), so see if there's a log message citing that error code.
>
> We've done the work to make this function.  However, it was a while ago
> and I don't believe anyone tests regularly now (particularly with the
> corner cases) so errors can creep back into the stack.

Ah, okay.  I was thinking more in the context of the error itself
being relatively obscure to find, since the program trying to perform
the I/O operation may report the error in a way that makes it look as
though an invalid argument to a command was received.

(At least that's how I discovered this, when I was wondering why I was
seeing "invalid argument" after trying a command that should have
worked, but failed; a blktrace run revealed a less-than-4096-byte read
that was being attempted, but failed with EINVAL.)

>> > I think it must be a couple of years ago now though, that I heard
>> > (on LWN) enterprise drives were apparently switching over to
>> > 4096/4096. With NO 512 emulation fall-back.
>>
>> Some drive manufacturers seem to be more eager than others, but
>> there's still work to be done.  For example, try this with a 4K-
>> native drive:
>>
>> 1. Write an ISO image to the drive with the command "dd
>> if=isofile.iso of=/dev/testdevice bs=4096 oflag=direct"
>>
>> 2. Create a test directory (for example, "/mnt/testdir"), then
>> attempt to mount the device with "mount /dev/testdevice /mnt/testdir"
>
> This is a textbook case of something that can never work: The
> requirement for a 4k drive is that the stack must be aligned, meaning
> 4k or multiple of 4k block size all the way up and down.  The isofs
> you're copying only has a 2k block size.  You get the same failure with
> any non 4k multiple filesystem block size.  Fortunately most modern
> filesystems have had 4k, or multiple thereof, block sizes for a while
> now, so you're unlikely to see this on your old ext4 devices but, in
> principle, it could happen.
>
> James

Then I hope that drive manufacturers don't start making 4K-native USB
flash drives; otherwise, we'll have a confusing situation on our
hands.

Bryan

>
>> When I tried it on RHEL 7.5, I saw this: "kernel: isofs_fill_super:
>> bread failed, dev=testdevice, iso_blknum=17, block=-2147483648"
>>
>> Note that ISO filesystems have a 2048-byte block size (maximum), but
>> in this test, it's stored on a block device with a block size of 4096
>> bytes.
>>
>> There may be more issues out there, but they have to be found first.
>> And finding the issues is difficult, due to the obscurity of the
>> error messages seen.
>>
>>
>> Thanks,
>>
>> Bryan
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html