Re: RAID6: "Bad block number requested"

Douglas Gilbert <dgilbert@xxxxxxxxxxxx> · Mon, 11 Jun 2018 11:18:59 -0400

On 2018-06-11 11:06 AM, James Bottomley wrote:
On Mon, 2018-06-11 at 16:24 +0200, Sebastian Hegler wrote:
Dear all,

First off: sorry for cross-posting. I don't know if this is a RAID
issue or a SCSI issue, so I'll just ask y'all.


For a RAID6 capacity upgrade (higher capacity drives), we bought some
10TB disks:
==================
Apr 17 11:16:05 kuiper kernel: [12795386.862031] scsi 6:0:36:0:
Direct-Access     ATA      HGST HUH721010AL T21D PQ: 0 ANSI: 6
Apr 17 11:16:05 kuiper kernel: [12795386.919904] scsi 6:0:36:0:
atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Apr 17 11:16:05 kuiper kernel: [12795386.974186] sd 6:0:36:0: [sdl]
2441609216 4096-byte logical blocks: (10.0 TB/9.10 TiB)

Well, this is the problem: a 4k logical (presumably 4k physical) drive
cannot be addressed in block sectors that are not divisible by 8.  This
type of drive configuration is very unusual (although it was something
we tested years ago before the industry realised it had to ship drives
with 4k physical but 512 byte logical sectors because of the legacy
problem).

Apr 17 11:16:05 kuiper kernel: [12795386.998016] sd 6:0:36:0: [sdl]
Write Protect is off
Apr 17 11:16:05 kuiper kernel: [12795387.000625] sd 6:0:36:0:
Attached scsi generic sg12 type 0
Apr 17 11:16:05 kuiper kernel: [12795387.035341] sd 6:0:36:0: [sdl]
Mode Sense: 7f 00 10 08
Apr 17 11:16:05 kuiper kernel: [12795387.035679] sd 6:0:36:0: [sdl]
Write cache: enabled, read cache: enabled, supports DPO and FUA
Apr 17 11:16:05 kuiper kernel: [12795387.098315] sd 6:0:36:0: [sdl]
Attached SCSI disk
==================

RAID add and rebuild operations went fine. However, some minutes
after rebuild completion, several hundreds of these error messages
started to appear:
==================
Apr 20 03:37:29 kuiper kernel: [13027072.454811] sd 6:0:36:0: [sdl]
Bad block number requested

This means that somehow, something sent a non 4k aligned 4k sized
request. SCSI here is just the messenger.  However, if you apply this
patch, it will capture the stack trace of what above it triggered this,
which may help us in debugging.  It could be we may also want to see
what the values of block and blk_rq_sectors(rq) actually are, but lets
begin with the stack trace.

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 9421d9877730..ac865e048533 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1109,6 +1109,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
  		if ((block & 7) || (blk_rq_sectors(rq) & 7)) {
  			scmd_printk(KERN_ERR, SCpnt,
  				    "Bad block number requested\n");

Not a very informative error message. How about a quasi SCSI one like:
    Logical Block out of range, due to different block sizes

Doug Gilbert

+			WARN_ON_ONCE(1);
  			goto out;
  		} else {
  			block = block >> 3;