On Tue, Feb 23, 2016 at 10:54 PM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > On Mon, Feb 22 2016 at 9:55pm -0500, > Ming Lei <ming.lei@xxxxxxxxxxxxx> wrote: > >> On Tue, Feb 23, 2016 at 6:58 AM, Kent Overstreet >> <kent.overstreet@xxxxxxxxx> wrote: >> > On Sun, Feb 21, 2016 at 05:40:59PM +0800, Ming Lei wrote: >> >> On Sun, Feb 21, 2016 at 2:43 PM, Ming Lin-SSI <ming.l@xxxxxxxxxxxxxxx> wrote: >> >> >>-----Original Message----- >> >> > >> >> > So it's almost already "per request_queue" >> >> >> >> Yes, that is because of the following line: >> >> >> >> q->bio_split = bioset_create(BIO_POOL_SIZE, 0); >> >> >> >> in blk_alloc_queue_node(). >> >> >> >> Looks like this bio_set doesn't need to be per-request_queue, and >> >> now it is only used for fast-cloning bio for splitting, and one global >> >> split bio_set should be enough. >> > >> > It does have to be per request queue for stacking block devices (which includes >> > loopback). >> >> In commit df2cb6daa4(block: Avoid deadlocks with bio allocation by >> stacking drivers), deadlock in this situation has been avoided already. >> Or are there other issues with global bio_set? I appreciate if you may >> explain it a bit if there are. > > Even with commit df2cb6daa4 there is still risk of deadlocks (even > without low memory condition), see: > https://patchwork.kernel.org/patch/7398411/ That is definitely another problem which isn't related with low memory, and I guess Kent means there might be deadlock risk in case of shared bio_set. > > (you may recall you blocked this patch with concerns about performance, > context switches, plug merging being compromised, etc.. to which I never > circled back to verify your concerns) I still remember that problem: 1) Process A - two bio(a, b) are splitted in dm's make_request funtion - bio(a) is submitted via generic_make_request(), so it is staged in current->bio_list - time t1 - before bio(b) is submitted, down_write(&s->lock) is run and never return 2) Process B: - just during time t1, wait completion of bio(a) by down_write(&s->lock) Then Process A waits the lock which is acquired by B first, and the two bio(a, b) can't reach to driver/device at all. Looks that current->bio_list is fragile to locks from make_request function, and moving the lock into workqueue context should be helpful. And I am happy to continue to discuss this issue further. > > But it illustrates the type of problems that can occur when your rescue > infrastructure is shared across devices (in the context of df2cb6daa4, > current->bio_list contains bios from multiple devices). > > If a single splitting bio_set were shared across devices there would be > no guarantee of forward progress with complex stacked devices (one or > more devices could exhaust the reserve and starve out other devices in > the stack). So keeping the bio_set per request_queue isn't prone to > failure like a shared bio_set might be. Not consider the dm lock problem, from Kent's commit(df2cb6daa4) log and the patch, looks forward progress can be guaranteed for stacked devices with same bio_set, but better to get Kent's clarification. If forward progress can be guaranteed, percpu mempool might avoid easy exhausting, because it is reasonable to assume that one CPU can only provide a certain amount of bandwidth wrt. block transfer. Thanks Ming -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel