On Thu, 22 Oct 2015, Ming Lei wrote: > > Some drivers (dm-snapshot, dm-thin) do acquire a mutex in .make_requests() > > for every bio. It wouldn't be practical to convert them to not acquire the > > mutex (and it would also degrade performance of these drivers, if they had > > to offload every bio to a worker thread that can acquire the mutex). > > Lots of drivers handle I/O in that way, and this way makes AIO not possible > basically for dm-snapshot. It doesn't have to do anything with asynchronous I/O. Of course you can do asynchronous I/O on dm-snapshot. > >> Also sometimes it can hurt performance by converting I/O submission > >> from one context into concurrent contexts of workqueue, especially > >> in case of sequential I/O, since plug & plug merge can't be used any > >> more. > > > > You can add blk_start_plug/blk_finish_plug to the function > > bio_alloc_rescue. That would be reasonable to make sure that the requests > > are merged even when they are offloaded to rescue thread. > > The IOs submitted from each wq context becomes not contineous any > more, so plug merge isn't doable, not mention the extra context switch > cost. If the requests are mergeable, blk_start_plug/blk_finish_plug will merge them, if not, it won't. > This kind of cost can be introduced for all bio devices just for handling > the unusual case, fair? Offloading bios to a worker thread when the make_request_fn function blocks is required to avoid a deadlock (BTW. the deadlock became more common in the kernel 4.3 due to unrestricted size of bios). The bio list current->bio_list introduces a false locking dependency - completion of a bio depends on completion of other bios on current->bio_list directed for different devices, thus it could create circular dependency resulting in deadlock. To avoid the circular dependency, each bio must be offloaded to a specific workqueue, so that completion of bio for device A no longer depends on completion of another bio for device B. > >> > - queue_work(bs->rescue_workqueue, &bs->rescue_work); > >> > + spin_lock(&bs->rescue_lock); > >> > + bio_list_add(&bs->rescue_list, bio); > >> > + queue_work(bs->rescue_workqueue, &bs->rescue_work); > >> > + spin_unlock(&bs->rescue_lock); > >> > + } > >> > >> Not like rescuring path, schedule out can be quite frequent, and the > >> above change will switch to submit these I/Os from wq concurrently, > >> which might hurt performance for sequential I/O. > >> > >> Also I am wondering why not submit these I/Os in 'current' context > >> just like what flush plug does? > > > > Processing requests doesn't block (they only take the queue spinlock). > > > > Processing bios can block (they can take other mutexes or semaphores), so > > processing them from the schedule hook is impossible - the bio's > > make_request function could attempt to take some lock that is already > > held. So - we must offload the bios to a separate workqueue. > > Yes, so better to just handle dm-snapshot in this way. All dm targets and almost all other bio-processing drivers can block in the make_request_fn function (for example, they may block when allocating from a mempool). Mikulas > Thanks, > Ming Lei > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel