On Fri, May 18, 2012 at 06:05:50PM +1000, NeilBrown wrote: > > Hi Kent, > there is lots of good stuff in this series and I certainly hope a lot of it > can eventually go upstream. > > However there is a part of this patch that I don't think is safe: > > > > +static void __bio_submit_split(struct closure *cl) > > +{ > > + struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl); > > + struct bio *bio = s->bio, *n; > > + > > + do { > > + n = bio_split(bio, bio_max_sectors(bio), > > + GFP_NOIO, s->q->bio_split); > > + if (!n) > > + continue_at(cl, __bio_submit_split, bio_split_wq); > > + > > + closure_get(cl); > > + generic_make_request(n); > > + } while (n != bio); > > + > > + continue_at(cl, bio_submit_split_done, NULL); > > +} > > Firstly a small point: Can bio_split ever return NULL here? I don't > think it can, so there is no need to test. > But if it can, then calling generic_make_request(NULL) doesn't seem like a > good idea. > > More significantly though:: > This is called from generic_make_request which can be called recursively and > enforces a tail-recursion semantic. > If that generic_make_request was a recursive call, then the > generic_make_request in __bio_submit_split will not start the request, but > will queue the bio for later handling. If we then call bio_split again, we > could try to allocation from a mempool while we are holding one entry > allocated from that pool captive. This can deadlock. > > i.e. if the original bio is so large that it needs to be split into 3 pieces, > then we will try to allocate the second piece before the first piece has a > chance to be released. If this happens in enough threads to exhaust the pool > (4 I think), it will deadlock. > > I realise this sounds like a very unlikely case, but of course they happen. > > One possible approach might be to count how many splits will be required, > then have an interface to mempools so you can allocate them all at once, > or block and wait. Yeah, I actually thought of that (I probably should've documented it, though). That's why the code is checking if bio_split() returned NULL; my verison of bio_split() checks if current->bio_list is non NULL, and if it is it doesn't pass __GFP_WAIT to bio_alloc_bioset(). (I was debating whether bio_split() should do that itself or leave it up to the caller. I tend to think it's too easy to accidentally screw up - and not notice it - so it should be enforced by generic code. Possibly the check should be moved to bio_alloc_bioset(), though.) If we do get a memory allocation failure, then we just punt to workqueue - continue_at() runs __bio_submit_split from the bio_split workqueue - where we can safely use the mempool. -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html