Hi Martin, Any inputs on this one? Thanks, ~Saurav > -----Original Message----- > From: Saurav Kashyap > Sent: Tuesday, February 7, 2023 4:50 PM > To: Martin K. Petersen <martin.petersen@xxxxxxxxxx> > Cc: linux-scsi <linux-scsi@xxxxxxxxxxxxxxx>; Girish Basrur > <gbasrur@xxxxxxxxxxx> > Subject: IO error on DIF/DIX supported array > > Hi Martin, > We have observed IO failure on 3PAR array that supports DIF/DIX with > upstream code. An error is only seen when IOs are done on DM devices, no > error observed if IO is done on /dev/sdX. > I added some prints to understand the problem and figured out that > SCSI_PROT_IP_CHECKSUM flag is not set in scmnd->prot_flags. Ideally it > should be set as BIP_IP_CHECKSUM should be set. > > --------------------<START: IO to /dev/sdc>---------------- > [Mon Feb 6 17:54:56 2023] SK: bio_integrity_prep setting IP_CHECKSUM > bio=ffff976f8d19c300 bip_flags=0x11 > [Mon Feb 6 17:54:56 2023] SK: sd_setup_protect_cmnd setting > IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11 > [Mon Feb 6 17:54:56 2023] SK: bio_integrity_prep setting IP_CHECKSUM > bio=ffff976f8d19c300 bip_flags=0x11 > [Mon Feb 6 17:54:56 2023] SK: sd_setup_protect_cmnd setting > IP_CHECKSUM bio=ffff976f8d19c300 bip_flags=0x11 > -------------------<END: IO to /dev/sdc>----------------- > > ----------------<START: IO to dm-10>--------------------- > [Mon Feb 6 17:55:13 2023] SK: bio_integrity_prep setting IP_CHECKSUM > bio=ffff976f8d19c300 bip_flags=0x11 > [Mon Feb 6 17:55:13 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM > bio=ffff976fa15fa490 bip_flags=0x0 > [Mon Feb 6 17:55:13 2023] dm-10: guard tag error at sector 0 (rcvd 0000, want > ffff) > [Mon Feb 6 17:55:13 2023] SK: bio_integrity_prep setting IP_CHECKSUM > bio=ffff978f0752c180 bip_flags=0x11 > [Mon Feb 6 17:55:13 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM > bio=ffff976fc87fef10 bip_flags=0x0 > [Mon Feb 6 17:55:13 2023] dm-10: guard tag error at sector 0 (rcvd 0000, want > ffff) > [Mon Feb 6 17:55:13 2023] Buffer I/O error on dev dm-10, logical block 0, > async page read > -----------------<END: IO to dm-10>------------------------ > > Its noticed that bio pointer get changed when IO is done through dm device. > I added more debug prints in bio_clone and bio_integrity_clone and > concluded that bip_flags are not getting copied in bio_integrity_clone > routine. > > -------------------- > [Tue Feb 7 14:15:47 2023] SK: bio_integrity_prep setting IP_CHECKSUM > bio=ffff891ecc5fa840 bip_flags=0x11 > [Tue Feb 7 14:15:47 2023] SK: __bio_clone: bio=ffff891ed97b5990 > bio_src=ffff891ecc5fa840 > [Tue Feb 7 14:15:47 2023] SK: bio_integrity_clone: bip=ffff891ecc5fd500 > bip_src=ffff891ecc5fcb40 bip_flags=0x0 src_bip_flags=0x11 > [Tue Feb 7 14:15:47 2023] SK: sd_setup_protect_cmnd else IP_CHECKSUM > bio=ffff891ed97b5990 bip_flags=0x0 > [Tue Feb 7 14:15:47 2023] dm-3: guard tag error at sector 0 (rcvd 0000, want > ffff) > [Tue Feb 7 14:15:47 2023] Buffer I/O error on dev dm-3, logical block 0, async > page read > ---------------------------------- > > If I add the change to copy the flags, following BUG_ON in slub.c is reported > ------------------<code>------------- > diff --git a/block/bio-integrity.c b/block/bio-integrity.c > index 3f5685c00e36..07e7443c7be3 100644 > --- a/block/bio-integrity.c > +++ b/block/bio-integrity.c > @@ -418,6 +418,7 @@ int bio_integrity_clone(struct bio *bio, struct bio > *bio_src, > > bip->bip_vcnt = bip_src->bip_vcnt; > bip->bip_iter = bip_src->bip_iter; > + bip->bip_flags = bip_src->bip_flags; > > return 0; > } > ----------------<code>--------------- > > ------------------<BUG_ON>-------------- > [ 751.838432] kernel BUG at mm/slub.c:435! > [ 751.838440] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > [ 751.838443] CPU: 49 PID: 981 Comm: kworker/49:1H Kdump: loaded Not > tainted 6.2.0-rc1+ #14 > [ 751.838447] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS > 2.5.6 10/06/2021 > [ 751.838448] Workqueue: kintegrityd bio_integrity_verify_fn > [ 751.838458] RIP: 0010:__slab_free+0x1ae/0x300 > [ 751.838467] Code: 4c 89 e6 48 89 ef 5d 41 5c 41 5d 41 5e 41 5f e9 d8 fb ff ff > 48 83 c4 60 4c 89 f7 5b 5d 41 5c 41 5d 41 5e 41 5f e9 62 3b 00 00 <0f> 0b 80 4c 24 > 4b 80 e9 ea fe ff ff 4c 89 fa 4d 89 d7 4c 8b 54 24 > [ 751.838469] RSP: 0018:ffffbb674fcf7dd0 EFLAGS: 00010246 > [ 751.838472] RAX: ffff9c320d3546e0 RBX: ffff9c325302e480 RCX: > 000000008040003f > [ 751.838473] RDX: ffffffc10e1546c0 RSI: ffffdfb30434d500 RDI: > ffff9c3200042500 > [ 751.838475] RBP: ffff9c3200042500 R08: 0000000000000001 R09: > ffffffffb4fbf08a > [ 751.838476] R10: ffffbb674fcf7ca0 R11: ffffffffb65e4ac8 R12: > ffffdfb30434d500 > [ 751.838477] R13: ffff9c320d3546c0 R14: ffff9c320d3546c0 R15: > ffff9c320d3546c0 > [ 751.838479] FS: 0000000000000000(0000) GS:ffff9c70ff840000(0000) > knlGS:0000000000000000 > [ 751.838481] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 751.838482] CR2: 00007fe84efedb00 CR3: 000000015472a000 CR4: > 0000000000350ee0 > [ 751.838484] Call Trace: > [ 751.838485] <TASK> > [ 751.838487] ? bio_integrity_process+0x14f/0x1c0 > [ 751.838494] ? __pfx_t10_pi_type1_verify_ip+0x10/0x10 [t10_pi] > [ 751.838501] bio_integrity_free+0xaa/0xb0 > [ 751.838504] bio_integrity_verify_fn+0x40/0x50 > [ 751.838507] process_one_work+0x1e5/0x3b0 > [ 751.838513] ? __pfx_worker_thread+0x10/0x10 > [ 751.838515] worker_thread+0x50/0x3a0 > [ 751.838518] ? __pfx_worker_thread+0x10/0x10 > [ 751.838520] kthread+0xd9/0x100 > [ 751.838525] ? __pfx_kthread+0x10/0x10 > [ 751.838528] ret_from_fork+0x2c/0x50 > [ 751.838535] </TASK> > ----------------------<BUG_ON>--------------- > > Queries > 1) Is there a specific reason for not copying the bip_flags in > bio_integrity_clone function? > 2) If bip_flags needs to be copied then is there something else needs to be > done that will take care of BUG_ON? > 3) if not, then what should be right solution for fix an IO error because of > SCSI_PROT_IP_CHECKSUM flag not set. > > > Thanks, > ~Saurav