Hello, On Tue, Aug 28, 2012 at 10:37:36AM -0700, Kent Overstreet wrote: > @@ -324,13 +342,37 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) > front_pad = 0; > inline_vecs = nr_iovecs; > } else { > + /* > + * generic_make_request() converts recursion to iteration; this > + * means if we're running beneath it, any bios we allocate and > + * submit will not be submitted (and thus freed) until after we > + * return. > + * > + * This exposes us to a potential deadlock if we allocate > + * multiple bios from the same bio_set() while running > + * underneath generic_make_request(). If we were to allocate > + * multiple bios (say a stacking block driver that was splitting > + * bios), we would deadlock if we exhausted the mempool's > + * reserve. > + * > + * We solve this, and guarantee forward progress, with a rescuer > + * workqueue per bio_set. If we go to allocate and there are > + * bios on current->bio_list, we first try the allocation > + * without __GFP_WAIT; if that fails, we punt those bios we > + * would be blocking to the rescuer workqueue before we retry > + * with the original gfp_flags. > + */ > + > + if (current->bio_list && !bio_list_empty(current->bio_list)) > + gfp_mask &= ~__GFP_WAIT; > +retry: > p = mempool_alloc(bs->bio_pool, gfp_mask); > front_pad = bs->front_pad; > inline_vecs = BIO_INLINE_VECS; > } > > if (unlikely(!p)) > - return NULL; > + goto err; > > bio = p + front_pad; > bio_init(bio); > @@ -351,6 +393,19 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) > > err_free: > mempool_free(p, bs->bio_pool); > +err: > + if (gfp_mask != saved_gfp) { > + gfp_mask = saved_gfp; > + > + spin_lock(&bs->rescue_lock); > + bio_list_merge(&bs->rescue_list, current->bio_list); > + bio_list_init(current->bio_list); > + spin_unlock(&bs->rescue_lock); > + > + queue_work(bs->rescue_workqueue, &bs->rescue_work); > + goto retry; > + } I wonder whether it would be easier to follow if this part is in-line where retry: is. All that needs to be duplicated is single mempool_alloc() call, right? Overall, I *think* this is correct but need to think more about it to be sure. Thanks! -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html