Re: Panic: qla2xxx will panic the systems when sending sg_write_same -T --lba=1 to a device that has no protection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2023-06-27 at 12:29 -0400, Laurence Oberman wrote:
> Hello
> 
> A customer discovered this on a RHEL 8.8 kernel but the issue also
> exists upstream with the current code in 6.4 for example.
> 
> [  177.143279]  ? qla2xxx_dif_start_scsi_mq+0xcd8/0xce0 [qla2xxx]
> [  177.149165]  ? internal_add_timer+0x42/0x70
> [  177.153372]  qla2xxx_mqueuecommand+0x207/0x2b0 [qla2xxx]
> [  177.158730]  scsi_queue_rq+0x2b7/0xc00
> [  177.162501]  blk_mq_dispatch_rq_list+0x3ea/0x7e0
> 
> Simple reproducer to a LUN with no protection
> sg_write_same -T --lba=1 /dev/sdxx  (or mpath)
> 
> With the device having no protection we land up with 
> SCSI_PROT_NORMAL being used so fall through to the BUG()
> 
> switch (scsi_get_prot_op(GET_CMD_SP(sp))) {
>         case SCSI_PROT_READ_INSERT:
>         case SCSI_PROT_WRITE_STRIP:
>                 total_bytes = data_bytes;
>                 data_bytes += dif_bytes;              
>                 break;
> 
>         case SCSI_PROT_READ_STRIP:
>         case SCSI_PROT_WRITE_INSERT:                                 
>         case SCSI_PROT_READ_PASS:
>         case SCSI_PROT_WRITE_PASS:
>                 total_bytes = data_bytes + dif_bytes;  
>                 break;
>         default:
>                 BUG();
>         }
> 
> 
> I also had David Jeffery look at this and his comment was
> 
> In this particular case, it looks like the issue is just with
> qla2xxx,
> regardless of the hardware. The scsi_disk being sent the command had
> no
> dif protection enabled and there was no dix data.
> 
> crash> struct scsi_disk.protection_type 0xff34947432176800
>   protection_type = 0 '\000',
> 
> crash> px ((struct scsi_cmnd *)0xff3494740b759138)->prot_sdb[0]
> $7 = {
>   table = {
>     sgl = 0xff3494740b7595a8,
>     nents = 0x0,
>     orig_nents = 0x0
>   },
>   length = 0x0,
>   resid = 0x0
> }
> 
> So a WRITE_SAME_32 prot_op was always going to be SCSI_PROT_NORMAL in
> prot_op. qla2xxx should not crash when passed such a command and
> state.
> 
> 
> KDUMP   
> Linux
> segstorage3
> 6.4.0+
> 
> [  176.960932] ------------[ cut here ]------------
> [  176.965582] kernel BUG at drivers/scsi/qla2xxx/qla_iocb.c:1459!
> [  176.971540] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [  176.976795] CPU: 10 PID: 16058 Comm: sg_write_same Kdump: loaded
> Tainted: G S                 6.4.0+ #1
> [  176.986240] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> Gen10, BIOS U30 05/17/2022
> [  176.994812] RIP: 0010:qla2xxx_dif_start_scsi_mq+0xcd8/0xce0
> [qla2xxx]
> [  177.001337] Code: ff ff 48 8b 7c 24 40 0f b7 bf 4c 01 00 00 e9 73
> f6
> ff ff 83 3d 68 a0 de ff 01 0f 8e 7b fd ff ff e9 6f fd ff ff e8 b8 7f
> 07
> ce <0f> 0b 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
> 90
> [  177.020217] RSP: 0018:ffffa1c44f86b9e0 EFLAGS: 00010046
> [  177.025470] RAX: 0000000000000008 RBX: ffff961087e29000 RCX:
> 0000000000000000
> [  177.032644] RDX: 0000000000000000 RSI: ffff9617c9e09460 RDI:
> 0000000000000200
> [  177.039818] RBP: ffff9617c9e09588 R08: ffff9617c9e09460 R09:
> 0000000000000200
> [  177.046992] R10: ffff96107800e880 R11: 0000000000000000 R12:
> 00000000000010c0
> [  177.054165] R13: ffff96107800e880 R14: ffff961064c52180 R15:
> ffff961066f8de00
> [  177.061337] FS:  00007f41eef7e740(0000) GS:ffff961f4d800000(0000)
> knlGS:0000000000000000
> [  177.069471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  177.075246] CR2: 000055e1e2591bd8 CR3: 00000008823b2005 CR4:
> 00000000007706e0
> [  177.082420] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  177.089594] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [  177.096768] PKRU: 55555554
> [  177.099487] Call Trace:
> [  177.101944]  <TASK>
> [  177.104052]  ? __die_body+0x1e/0x60
> [  177.107560]  ? die+0x3c/0x60
> [  177.110454]  ? do_trap+0xe6/0x110
> [  177.113786]  ? qla2xxx_dif_start_scsi_mq+0xcd8/0xce0 [qla2xxx]
> [  177.119674]  ? do_error_trap+0x65/0x80
> [  177.123442]  ? qla2xxx_dif_start_scsi_mq+0xcd8/0xce0 [qla2xxx]
> [  177.129328]  ? exc_invalid_op+0x50/0x70
> [  177.133184]  ? qla2xxx_dif_start_scsi_mq+0xcd8/0xce0 [qla2xxx]
> [  177.139071]  ? asm_exc_invalid_op+0x1a/0x20
> [  177.143279]  ? qla2xxx_dif_start_scsi_mq+0xcd8/0xce0 [qla2xxx]
> [  177.149165]  ? internal_add_timer+0x42/0x70
> [  177.153372]  qla2xxx_mqueuecommand+0x207/0x2b0 [qla2xxx]
> [  177.158730]  scsi_queue_rq+0x2b7/0xc00
> [  177.162501]  blk_mq_dispatch_rq_list+0x3ea/0x7e0
> [  177.167143]  __blk_mq_sched_dispatch_requests+0xac/0x670
> [  177.172485]  ? blk_rq_map_user_iov+0x2ae/0x690
> [  177.176952]  ? blk_mq_request_bypass_insert+0x74/0xa0
> [  177.182031]  blk_mq_sched_dispatch_requests+0x37/0x70
> [  177.187110]  blk_mq_run_hw_queue+0x183/0x1b0
> [  177.191402]  blk_execute_rq+0x103/0x230
> [  177.195257]  sg_io+0x17f/0x360
> [  177.198327]  scsi_ioctl_sg_io+0x69/0x90
> [  177.202182]  scsi_ioctl+0x4c6/0x890
> [  177.205688]  ? scsi_block_when_processing_errors+0x26/0xd0
> [  177.211204]  ? multipath_prepare_ioctl+0x50/0x130 [dm_multipath]
> [  177.217247]  dm_blk_ioctl+0x72/0x120 [dm_mod]
> [  177.221637]  blkdev_ioctl+0x1c2/0x280
> [  177.225320]  __x64_sys_ioctl+0x90/0xd0
> [  177.229089]  do_syscall_64+0x3b/0x90
> [  177.232683]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [  177.237762] RIP: 0033:0x7f41ee4397cb
> [  177.241355] Code: 73 01 c3 48 8b 0d bd 56 38 00 f7 d8 64 89 01 48
> 83
> c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00
> 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 56 38 00 f7 d8 64 89 01
> 48
> [  177.260234] RSP: 002b:00007ffe44cf3578 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  177.267846] RAX: ffffffffffffffda RBX: 000055e1e25909a0 RCX:
> 00007f41ee4397cb
> [  177.275018] RDX: 00007ffe44cf3580 RSI: 0000000000002285 RDI:
> 0000000000000003
> [  177.282191] RBP: 0000000000000003 R08: 0000000000000040 R09:
> 000055e1e2590a50
> [  177.289363] R10: 0000000000000000 R11: 0000000000000246 R12:
> 0000000000000000
> [  177.296535] R13: 00007ffe44cf3638 R14: 000055e1e25909a0 R15:
> 00007ffe44cf3890

Hello Nilesh,
This is not a final patchand will need a cleanup but something I came
up with that will prevent the panic. You probably have better ideas.
I have not signed it as its just a suggestion.


    [PATCH] scsi: qla2xxx avoid a panic due to BUG() if
     a command is sent to a device that has no protection.
    
    If a device does not have protection, qla2xx will land up
    defaulting to a BUG() and system panic.
    This is because SCSI_PROT_NORMAL is matched and the
    default used to be BUG().
    This patch avoids the BUG() and prints a WARN
    
    
diff --git a/drivers/scsi/qla2xxx/qla_iocb.c
b/drivers/scsi/qla2xxx/qla_iocb.c
index b9b3e6f80ea9..3fca7c7b7a92 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -1443,6 +1443,12 @@ qla24xx_build_scsi_crc_2_iocbs(srb_t *sp, struct
cmd_type_crc_2 *cmd_pkt,
        dif_bytes = (data_bytes / blk_size) * 8;
 
        switch (scsi_get_prot_op(GET_CMD_SP(sp))) {
+       case SCSI_PROT_NORMAL:
+               total_bytes = data_bytes;
+               WARN(1, "device has no protection, command sent
expecting\
+                        DIF or DIX protection with proto_op=%d",
+                       cmd->prot_op);
+               break;
        case SCSI_PROT_READ_INSERT:
        case SCSI_PROT_WRITE_STRIP:
                total_bytes = data_bytes;



sg_write_same -T --lba=1 /dev/mapper/mpathz1

[root@segstorage3 ~]# sg_write_same -T --lba=1 /dev/mapper/mpathz1
Write same: transport: Host_status=0x07 [DID_ERROR]
Driver_status=0x00 [DRIVER_OK]

Write same(32): Sense category: -1, try '-v' option for more
information
Some error occurred, try again with '-v' or '-vv' for more information

segstorage3 login: [  785.431935] ------------[ cut here ]------------
[  785.436586] device has no protection, command sent
expecting			DIF or DIX protection with proto_op=0
[  785.436635] WARNING: CPU: 39 PID: 20588 at
drivers/scsi/qla2xxx/qla_iocb.c:1450
qla2xxx_dif_start_scsi_mq+0x4b4/0xd40 [qla2xxx]

[  785.534337] CPU: 39 PID: 20588 Comm: sg_write_same Kdump: loaded
Tainted: G S      W          6.4.0+ #1
[  785.543782] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
Gen10, BIOS U30 05/17/2022
[  785.552353] RIP: 0010:qla2xxx_dif_start_scsi_mq+0x4b4/0xd40
[qla2xxx]
[  785.558853] Code: b6 b0 98 00 00 00 48 c7 c7 e0 e9 79 c0 44 89 5c 24
74 89 44 24 70 44 89 4c 24 6c 4c 89 54 24 60 4c 89 44 24 50 e8 dc 67 9e
c0 <0f> 0b 48 8b b5 98 00 00 00 44 8b 4c 24 6c 4c 8b 44 24 50 4c 8b 54
[  785.577731] RSP: 0018:ffffc9000d527988 EFLAGS: 00010086
[  785.582985] RAX: 0000000000000000 RBX: ffff8881160c6000 RCX:
0000000000000027
[  785.590160] RDX: 0000000000000027 RSI: 00000000ffdfffff RDI:
ffff88900dae0848
[  785.597333] RBP: ffff88884be0f948 R08: 0000000000000000 R09:
c0000000ffdfffff
[  785.604506] R10: 0000000000000001 R11: ffffc9000d527820 R12:
0000000000001c68
[  785.611680] R13: ffff8881539d8d80 R14: ffff88813370eb40 R15:
ffff888105910800
[  785.618853] FS:  00007fd53861a740(0000) GS:ffff88900dac0000(0000)
knlGS:0000000000000000
[  785.626987] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  785.632762] CR2: 0000556d74bfabd8 CR3: 000000087696c003 CR4:
00000000007706e0
[  785.639936] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  785.647110] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  785.654283] PKRU: 55555554
[  785.657002] Call Trace:
[  785.659461]  <TASK>
[  785.661571]  ? __warn+0x85/0x140
[  785.664819]  ? qla2xxx_dif_start_scsi_mq+0x4b4/0xd40 [qla2xxx]
[  785.670709]  ? report_bug+0xfc/0x1e0
[  785.674306]  ? handle_bug+0x3f/0x70
[  785.677815]  ? exc_invalid_op+0x17/0x70
[  785.681669]  ? asm_exc_invalid_op+0x1a/0x20
[  785.685880]  ? qla2xxx_dif_start_scsi_mq+0x4b4/0xd40 [qla2xxx]
[  785.691767]  ? qla2xxx_dif_start_scsi_mq+0x4b4/0xd40 [qla2xxx]
[  785.697651]  qla2xxx_mqueuecommand+0x207/0x2b0 [qla2xxx]
[  785.703007]  scsi_queue_rq+0x2b7/0xc00
[  785.706781]  blk_mq_dispatch_rq_list+0x3ea/0x7e0
[  785.711426]  __blk_mq_sched_dispatch_requests+0xac/0x670
[  785.716770]  ? blk_rq_map_user_iov+0x2ae/0x690
[  785.721238]  ? blk_mq_request_bypass_insert+0x74/0xa0
[  785.726317]  blk_mq_sched_dispatch_requests+0x37/0x70
[  785.731395]  blk_mq_run_hw_queue+0x183/0x1b0
[  785.735688]  blk_execute_rq+0x103/0x230
[  785.739545]  sg_io+0x17f/0x360
[  785.742614]  scsi_ioctl_sg_io+0x69/0x90
[  785.746470]  scsi_ioctl+0x4c6/0x890
[  785.749974]  ? scsi_block_when_processing_errors+0x26/0xd0
[  785.755489]  ? multipath_prepare_ioctl+0x50/0x130 [dm_multipath]
[  785.761531]  dm_blk_ioctl+0x72/0x120 [dm_mod]
[  785.765925]  dm_blk_ioctl+0x72/0x120 [dm_mod]
[  785.770312]  blkdev_ioctl+0x1c2/0x280
[  785.773995]  __x64_sys_ioctl+0x90/0xd0
[  785.777767]  do_syscall_64+0x3b/0x90
[  785.781360]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  785.786440] RIP: 0033:0x7fd537a397cb
[  785.790034] Code: 73 01 c3 48 8b 0d bd 56 38 00 f7 d8 64 89 01 48 83
c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 56 38 00 f7 d8 64 89 01 48
[  785.808912] RSP: 002b:00007ffdef6ef068 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  785.816524] RAX: ffffffffffffffda RBX: 0000556d74bf99a0 RCX:
00007fd537a397cb
[  785.823699] RDX: 00007ffdef6ef070 RSI: 0000000000002285 RDI:
0000000000000003
[  785.830873] RBP: 0000000000000003 R08: 0000000000000040 R09:
0000556d74bf9a50
[  785.838046] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000000000
[  785.845221] R13: 00007ffdef6ef128 R14: 0000556d74bf99a0 R15:
00007ffdef6ef380
[  785.852396]  </TASK>
[  785.854590] ---[ end trace 0000000000000000 ]---





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux