Having studied the code in question more thoroughly, my first email's scenario can't occur, I believe. bio_put() contains a atomic_dec_and_test(), which (according to Documentation/atomic_t.txt), having a return value, are fully ordered and therefore impose a general memory barrier. A general memory barrier means that no value set before bio_put() may be observed afterwards, if I accurately understand Documentation/memory_barriers.txt. Thus, the scenario I imagined, with a load from inside __bio_chain_endio() being visible in bi_end_io(), cannot occur. However, the second email's scenario, of a compiler making two stores out of a conditional store, is still accurate according to my understanding. I continue to believe setting parent->bi_status needs to be a WRITE_ONCE() and any reading of parent->bi_status before bio_put() needs to be at least a READ_ONCE(). The last thing in that email is wrong for the same reason that the first email couldn't happen: the bio_put() general barrier means later accesses to the field from a single thread will freshly read the field and thereby not get an incorrectly cached value. As a concrete proposal, I believe either of the following work and fix the race NeilB described, as well as the compiler (or CPU) race I described: - if (!parent->bi_status) - parent->bi_status = bio->bi_status; + if (bio->bi_status) + WRITE_ONCE(parent->bi_status, bio->bi_status); --or-- - if (!parent->bi_status) - parent->bi_status = bio->bi_status; + if (!READ_ONCE(parent->bi_status) && bio->bi_status) + WRITE_ONCE(parent->bi_status, bio->bi_status); I believe the second of these might, but is not guaranteed to, preserve the first error observed in a child; I believe if you want to definitely save the first error you need an atomic.