On 3/31/21 5:53 AM, Yufen Yu wrote: > For multiple split bios, if one of the bio is fail, the whole > should return error to application. But we found there is a race > between bio_integrity_verify_fn and bio complete, which return > io success to application after one of the bio fail. The race as > following: > > split bio(READ) kworker > > nvme_complete_rq > blk_update_request //split error=0 > bio_endio > bio_integrity_endio > queue_work(kintegrityd_wq, &bip->bip_work); > > bio_integrity_verify_fn > bio_endio //split bio > __bio_chain_endio > if (!parent->bi_status) > > <interrupt entry> > nvme_irq > blk_update_request //parent error=7 > req_bio_endio > bio->bi_status = 7 //parent bio > <interrupt exit> > > parent->bi_status = 0 > parent->bi_end_io() // return bi_status=0 > > The bio has been split as two: split and parent. When split > bio completed, it depends on kworker to do endio, while > bio_integrity_verify_fn have been interrupted by parent bio > complete irq handler. Then, parent bio->bi_status which have > been set in irq handler will overwrite by kworker. > > In fact, even without the above race, we also need to conside > the concurrency beteen mulitple split bio complete and update > the same parent bi_status. Normally, multiple split bios will > be issued to the same hctx and complete from the same irq > vector. But if we have updated queue map between multiple split > bios, these bios may complete on different hw queue and different > irq vector. Then the concurrency update parent bi_status may > cause the final status error. Applied, thanks. -- Jens Axboe