On Wed, 28 Jan 2015 10:29:46 -0500 Nate Dailey <nate.dailey@xxxxxxxxxxx> wrote: > I'm writing about something that appears to be an issue with raid1's > narrow_write_error, particular to non-512-byte-sector disks. Here's what > I'm doing: > > - 2 disk raid1, 4K disks, each connected to a different SAS HBA > - mount a filesystem on the raid1, run a test that writes to it > - remove one of the SAS HBAs (echo 1 > > /sys/bus/pci/devices/0000\:45\:00.0/remove) > > At this point, writes fail and narrow_write_error breaks them up and > retries, one sector at a time. But these are 512-byte sectors, and sd > doesn't like it: > > [ 2645.310517] sd 3:0:1:0: [sde] Bad block number requested > [ 2645.310610] sd 3:0:1:0: [sde] Bad block number requested > [ 2645.310690] sd 3:0:1:0: [sde] Bad block number requested > ... > > There appears to be no real harm done, but there can be a huge number of > these messages in the log. > > I can avoid this by disabling bad block tracking, but it looks like > maybe the superblock's bblog_shift is intended to address this exact > issue. However, I don't see a way to change it. Presumably this is > something mdadm should be setting up? I don't see bblog_shift ever set > to anything other than 0. > > This is on a RHEL 7.1 kernel, version 3.10.0-221.el7. I took a look at > upstream sd and md changes and nothing jumps out at me that would have > affected this (but I have not tested to see if the bad block messages do > or do not happen on an upstream kernel). > > I'd appreciate any advice re: how to handle this. Thanks! Thanks for the report. narrow_write_error() should use bdev_logical_block_size() and round up to that. Possibly mdadm should get the same information and set bblog_shift accordingly when creating a bad block log. I've made a note to fix that, but I'm happy to review patches too :-) thanks, NeilBrown
Attachment:
pgpf40P2AVczj.pgp
Description: OpenPGP digital signature