For multiple split bios, if one of the bio is fail, the whole should return error to application. But we found there is a race between bio_integrity_verify_fn and bio complete, which return io success to application after one of the bio fail. The race as following: split bio(READ) kworker nvme_complete_rq blk_update_request //split error=0 bio_endio bio_integrity_endio queue_work(kintegrityd_wq, &bip->bip_work); bio_integrity_verify_fn bio_endio //split bio __bio_chain_endio if (!parent->bi_status) <interrupt entry> nvme_irq blk_update_request //parent error=7 req_bio_endio bio->bi_status = 7 //parent bio <interrupt exit> parent->bi_status = 0 parent->bi_end_io() // return bi_status=0 The bio has been split as two: split and parent. When split bio completed, it depends on kworker to do endio, while bio_integrity_verify_fn have been interrupted by parent bio complete irq handler. Then, parent bio->bi_status which have been set in irq handler will overwrite by kworker. In fact, even without the above race, we also need to conside the concurrency beteen mulitple split bio complete and update the same parent bi_status. Normally, multiple split bios will be issued to the same hctx and complete from the same irq vector. But if we have updated queue map between multiple split bios, these bios may complete on different hw queue and different irq vector. Then the concurrency update parent bi_status may cause the final status error. Signed-off-by: Yufen Yu <yuyufen@xxxxxxxxxx> --- block/bio.c | 3 +++ block/blk-core.c | 4 ++++ include/linux/blk_types.h | 1 + 3 files changed, 8 insertions(+) diff --git a/block/bio.c b/block/bio.c index 26b7f721cda8..74863aaebf53 100644 --- a/block/bio.c +++ b/block/bio.c @@ -276,9 +276,12 @@ EXPORT_SYMBOL(bio_reset); static struct bio *__bio_chain_endio(struct bio *bio) { struct bio *parent = bio->bi_private; + unsigned long flags; + spin_lock_irqsave(&parent->bi_lock, flags); if (!parent->bi_status) parent->bi_status = bio->bi_status; + spin_unlock_irqrestore(&parent->bi_lock, flags); bio_put(bio); return parent; } diff --git a/block/blk-core.c b/block/blk-core.c index fc60ff208497..362653e937ff 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -241,8 +241,12 @@ static void print_req_error(struct request *req, blk_status_t status, static void req_bio_endio(struct request *rq, struct bio *bio, unsigned int nbytes, blk_status_t error) { + unsigned long flags; + + spin_lock_irqsave(&bio->bi_lock, flags); if (error) bio->bi_status = error; + spin_unlock_irqrestore(&bio->bi_lock, flags); if (unlikely(rq->rq_flags & RQF_QUIET)) bio_set_flag(bio, BIO_QUIET); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index db026b6ec15a..c3d50ad99510 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -231,6 +231,7 @@ struct bio { unsigned short bi_ioprio; unsigned short bi_write_hint; blk_status_t bi_status; + spinlock_t bi_lock; atomic_t __bi_remaining; struct bvec_iter bi_iter; -- 2.25.4