On Mon, 2018-06-11 at 16:24 +0200, Sebastian Hegler wrote: > Dear all, > > First off: sorry for cross-posting. I don't know if this is a RAID > issue or a SCSI issue, so I'll just ask y'all. > > > For a RAID6 capacity upgrade (higher capacity drives), we bought some > 10TB disks: > ================== > Apr 17 11:16:05 kuiper kernel: [12795386.862031] scsi 6:0:36:0: > Direct-Access ATA HGST HUH721010AL T21D PQ: 0 ANSI: 6 > Apr 17 11:16:05 kuiper kernel: [12795386.919904] scsi 6:0:36:0: > atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) > Apr 17 11:16:05 kuiper kernel: [12795386.974186] sd 6:0:36:0: [sdl] > 2441609216 4096-byte logical blocks: (10.0 TB/9.10 TiB) Well, this is the problem: a 4k logical (presumably 4k physical) drive cannot be addressed in block sectors that are not divisible by 8. This type of drive configuration is very unusual (although it was something we tested years ago before the industry realised it had to ship drives with 4k physical but 512 byte logical sectors because of the legacy problem). > Apr 17 11:16:05 kuiper kernel: [12795386.998016] sd 6:0:36:0: [sdl] > Write Protect is off > Apr 17 11:16:05 kuiper kernel: [12795387.000625] sd 6:0:36:0: > Attached scsi generic sg12 type 0 > Apr 17 11:16:05 kuiper kernel: [12795387.035341] sd 6:0:36:0: [sdl] > Mode Sense: 7f 00 10 08 > Apr 17 11:16:05 kuiper kernel: [12795387.035679] sd 6:0:36:0: [sdl] > Write cache: enabled, read cache: enabled, supports DPO and FUA > Apr 17 11:16:05 kuiper kernel: [12795387.098315] sd 6:0:36:0: [sdl] > Attached SCSI disk > ================== > > RAID add and rebuild operations went fine. However, some minutes > after rebuild completion, several hundreds of these error messages > started to appear: > ================== > Apr 20 03:37:29 kuiper kernel: [13027072.454811] sd 6:0:36:0: [sdl] > Bad block number requested This means that somehow, something sent a non 4k aligned 4k sized request. SCSI here is just the messenger. However, if you apply this patch, it will capture the stack trace of what above it triggered this, which may help us in debugging. It could be we may also want to see what the values of block and blk_rq_sectors(rq) actually are, but lets begin with the stack trace. James --- diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 9421d9877730..ac865e048533 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1109,6 +1109,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt) if ((block & 7) || (blk_rq_sectors(rq) & 7)) { scmd_printk(KERN_ERR, SCpnt, "Bad block number requested\n"); + WARN_ON_ONCE(1); goto out; } else { block = block >> 3;