IO error on DIF/DIX supported array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Martin,
We have observed IO failure on 3PAR array that supports DIF/DIX with upstream code. An error is only seen when IOs are done on DM devices, no error observed if IO is done on /dev/sdX.
I added some prints to understand the problem and figured out that SCSI_PROT_IP_CHECKSUM flag is not set in scmnd->prot_flags. Ideally it should be set as BIP_IP_CHECKSUM should be set.

--------------------<START: IO to /dev/sdc>----------------
[Mon Feb 6 17:54:56 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
[Mon Feb 6 17:54:56 2023] SK: sd_setup_protect_cmnd setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
[Mon Feb 6 17:54:56 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
[Mon Feb 6 17:54:56 2023] SK: sd_setup_protect_cmnd setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
-------------------<END: IO to /dev/sdc>-----------------

----------------<START: IO to dm-10>---------------------
[Mon Feb 6 17:55:13 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11
[Mon Feb 6 17:55:13 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM bio=ffff976fa15fa490 bip_flags=0x0
[Mon Feb 6 17:55:13 2023] dm-10: guard tag error at sector 0 (rcvd 0000, want ffff)
[Mon Feb 6 17:55:13 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff978f0752c180 bip_flags=0x11
[Mon Feb 6 17:55:13 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM bio=ffff976fc87fef10 bip_flags=0x0
[Mon Feb 6 17:55:13 2023] dm-10: guard tag error at sector 0 (rcvd 0000, want ffff)
[Mon Feb 6 17:55:13 2023] Buffer I/O error on dev dm-10, logical block 0, async page read
-----------------<END: IO to dm-10>------------------------

Its noticed that bio pointer get changed when IO is done through dm device.  I added more debug prints in bio_clone and bio_integrity_clone and concluded that bip_flags are not getting copied in bio_integrity_clone routine.

--------------------
[Tue Feb  7 14:15:47 2023] SK: bio_integrity_prep setting IP_CHECKSUM bio=ffff891ecc5fa840 bip_flags=0x11
[Tue Feb  7 14:15:47 2023] SK: __bio_clone: bio=ffff891ed97b5990 bio_src=ffff891ecc5fa840
[Tue Feb  7 14:15:47 2023] SK: bio_integrity_clone: bip=ffff891ecc5fd500 bip_src=ffff891ecc5fcb40 bip_flags=0x0 src_bip_flags=0x11
[Tue Feb  7 14:15:47 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM bio=ffff891ed97b5990 bip_flags=0x0
[Tue Feb  7 14:15:47 2023] dm-3: guard tag error at sector 0 (rcvd 0000, want ffff)
[Tue Feb  7 14:15:47 2023] Buffer I/O error on dev dm-3, logical block 0, async page read
----------------------------------

If I add the change to copy the flags, following  BUG_ON in slub.c is reported
------------------<code>-------------
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 3f5685c00e36..07e7443c7be3 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -418,6 +418,7 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,

        bip->bip_vcnt = bip_src->bip_vcnt;
        bip->bip_iter = bip_src->bip_iter;
+       bip->bip_flags = bip_src->bip_flags;

        return 0;
 }
----------------<code>---------------

------------------<BUG_ON>--------------
[  751.838432] kernel BUG at mm/slub.c:435!
[  751.838440] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  751.838443] CPU: 49 PID: 981 Comm: kworker/49:1H Kdump: loaded Not tainted 6.2.0-rc1+ #14
[  751.838447] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.5.6 10/06/2021
[  751.838448] Workqueue: kintegrityd bio_integrity_verify_fn
[  751.838458] RIP: 0010:__slab_free+0x1ae/0x300
[  751.838467] Code: 4c 89 e6 48 89 ef 5d 41 5c 41 5d 41 5e 41 5f e9 d8 fb ff ff 48 83 c4 60 4c 89 f7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 62 3b 00 00 <0f> 0b 80 4c 24 4b 80 e9 ea fe ff ff 4c 89 fa 4d 89 d7 4c 8b 54 24
[  751.838469] RSP: 0018:ffffbb674fcf7dd0 EFLAGS: 00010246
[  751.838472] RAX: ffff9c320d3546e0 RBX: ffff9c325302e480 RCX: 000000008040003f
[  751.838473] RDX: ffffffc10e1546c0 RSI: ffffdfb30434d500 RDI: ffff9c3200042500
[  751.838475] RBP: ffff9c3200042500 R08: 0000000000000001 R09: ffffffffb4fbf08a
[  751.838476] R10: ffffbb674fcf7ca0 R11: ffffffffb65e4ac8 R12: ffffdfb30434d500
[  751.838477] R13: ffff9c320d3546c0 R14: ffff9c320d3546c0 R15: ffff9c320d3546c0
[  751.838479] FS:  0000000000000000(0000) GS:ffff9c70ff840000(0000) knlGS:0000000000000000
[  751.838481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  751.838482] CR2: 00007fe84efedb00 CR3: 000000015472a000 CR4: 0000000000350ee0
[  751.838484] Call Trace:
[  751.838485]  <TASK>
[  751.838487]  ? bio_integrity_process+0x14f/0x1c0
[  751.838494]  ? __pfx_t10_pi_type1_verify_ip+0x10/0x10 [t10_pi]
[  751.838501]  bio_integrity_free+0xaa/0xb0
[  751.838504]  bio_integrity_verify_fn+0x40/0x50
[  751.838507]  process_one_work+0x1e5/0x3b0
[  751.838513]  ? __pfx_worker_thread+0x10/0x10
[  751.838515]  worker_thread+0x50/0x3a0
[  751.838518]  ? __pfx_worker_thread+0x10/0x10
[  751.838520]  kthread+0xd9/0x100
[  751.838525]  ? __pfx_kthread+0x10/0x10
[  751.838528]  ret_from_fork+0x2c/0x50
[  751.838535]  </TASK>
----------------------<BUG_ON>---------------

Queries
1) Is there a specific reason for not copying the bip_flags in bio_integrity_clone function?
2) If bip_flags needs to be copied then is there something else needs to be done that will take care of BUG_ON?
3) if not, then what should be right solution for fix an IO error because of SCSI_PROT_IP_CHECKSUM flag not set.


Thanks,
~Saurav 





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux