Hello All, I was requested to create a raid0 device on top NVMe with 4k block size and DPS2 enabled. But this config failed in case bio_intergrity_pre called before bio_split. Some research pointed me to the two fixes related to my problem, first was commit f36ea50ca0043e7b1204feaf1d2ba6bd68c08d36 Author: Wen Xiong <wenxiong@xxxxxxxxxxxxxxxxxx> Date: Wed May 10 08:54:11 2017 -0500 blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op When formatting NVMe to 512B/4K + T10 DIf/DIX, dd with split op returns "Input/output error". Looks block layer split the bio after calling bio_integrity_prep(bio). This patch fixes the issue. and second was commit e4dc9a4c31fe10d1751c542702afc85be8a5c56a Author: Israel Rukshin <israelr@xxxxxxxxxxxx> Date: Wed Dec 11 17:36:02 2019 +0200 scsi: target/iblock: Fix protection error with blocks greater than 512B … But both ideas not acceptable for me and i continue a research. Block io trace pointed me to the three functions called in my case, it is t10_pi_generate, bio_integrity_advance, t10_pi_type1_prepare. Looking in code - t10_pi_generate generate a ref_tag based on “virtual” block number (512b base), and t10_pi_type1_prepare - converts this data to the the real ref_tag (aka device block number), but sometimes it’s don’t mapped. Looking to the void bio_integrity_advance(struct bio *bio, unsigned int bytes_done) { struct bio_integrity_payload *bip = bio_integrity(bio); struct blk_integrity *bi = blk_get_integrity(bio->bi_disk); unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9); bip->bip_iter.bi_sector += bytes_done >> 9; bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes); } it have shit an iterator in the 512b block size base it looks right, but wait… static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter, csum_fn *fn, enum t10_dif_type type) { unsigned int i; for (i = 0 ; i < iter->data_size ; i += iter->interval) { struct t10_pi_tuple *pi = iter->prot_buf; iter->seed++; <<< } return BLK_STS_OK; } t10_pi_generate / t10_pi_type1_prepare have just a increment by “1” for the integrity internal which is 4k in my case, so any bio_integrity_advance call will be move an iterator outside of generated sequence and t10_pi_type1_prepare can’t be found a good virt sector for the mapping. Changing an increment by “1” to be related to the real integrity size solve a problem completely. Attached patch passed my own testing on raid0 with 4k block size.
Attachment:
t10-pi-fix-intergrity-iterator.patch
Description: Binary data
Alex