I am perhaps not understanding the intricacies here, or not seeing a barrier protecting it, so forgive me if I'm off base. I think reading parent->bi_status here is unsafe. Consider the following sequence of events on two threads. Thread 0 Thread 1 In __bio_chain_endio: In __bio_chain_endio: [A] Child 0 reads parent->bi_status, no error. Child bio 1 reads parent, no error seen It sets parent->bi_status to an error It calls bio_put. Child bio 0 calls bio_put [end __bio_chain_endio] [end __bio_chain_endio] In bio_chain_endio(), bio_endio(parent) is called, calling bio_remaining_done() which decrements __bi_remaining to 1 and returns false, so no further endio stuff is done. In bio_chain_endio(), bio_endio(parent) is called, calling bio_remaining_done(), decrementing parent->__bi_remaining to 0, and continuing to finish parent. Either for block tracing or for parent's bi_end_io(), this thread tries to read parent->bi_status again. The compiler or the CPU may cache the read from [A], and since there are no intervening barriers, parent->bi_status is still believed on thread 0 to be success. Thus the bio may still be falsely believed to have completed successfully, even though child 1 set an error in it. Am I missing a subtlety here?