On Thu, Nov 14 2019 at 9:10am -0500, Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: > Single thread fio test (read, bs=4k, ioengine=libaio, iodepth=128, > numjobs=1) over dm-thin device has poor performance versus bare nvme > disk on v5.4.0-rc7. > > Further investigation with perf indicates that queue_work_on() consumes > over 20% CPU time when doing IO over dm-thin device. The call stack is > as follows. > > - 46.64% thin_map > + 22.28% queue_work_on > + 12.98% dm_thin_find_block > + 5.07% cell_defer_no_holder > + 2.42% bio_detain.isra.33 > + 0.95% inc_all_io_entry.isra.31.part.32 > 0.80% remap.isra.41 > > In cell_defer_no_holder(), wakeup_worker() is always called, no matter whether > the cell->bios list is empty or not. In single thread IO model, cell->bios list > is most likely empty. So skip waking up worker thread if cell->bios list is > empty. > > A significant IO performance of single thread can be seen with this patch. > The original IO performance is 445 MiB/s with the fio test previously > described, while it is 643 MiB/s after applying the patch, which is a > 44% performance improvement. > > Signed-off-by: Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx> Nice find, implementation detail questions inlined below. > --- > drivers/md/dm-bio-prison-v1.c | 8 +++++++- > drivers/md/dm-bio-prison-v1.h | 2 +- > drivers/md/dm-thin.c | 10 ++++++---- > 3 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/dm-bio-prison-v1.c b/drivers/md/dm-bio-prison-v1.c > index b538989..b2a9b8d 100644 > --- a/drivers/md/dm-bio-prison-v1.c > +++ b/drivers/md/dm-bio-prison-v1.c > @@ -219,11 +219,17 @@ static void __cell_release_no_holder(struct dm_bio_prison *prison, > > void dm_cell_release_no_holder(struct dm_bio_prison *prison, > struct dm_bio_prison_cell *cell, > - struct bio_list *inmates) > + struct bio_list *inmates, int *empty) > { > unsigned long flags; > > spin_lock_irqsave(&prison->lock, flags); > + /* > + * The empty flag should represent the list state exactly > + * when the list is merged into @inmates, thus get the > + * list state when prison->lock is held. > + */ > + *empty = bio_list_empty(&cell->bios); > __cell_release_no_holder(prison, cell, inmates); > spin_unlock_irqrestore(&prison->lock, flags); > } > diff --git a/drivers/md/dm-bio-prison-v1.h b/drivers/md/dm-bio-prison-v1.h > index cec52ac..500edbc 100644 > --- a/drivers/md/dm-bio-prison-v1.h > +++ b/drivers/md/dm-bio-prison-v1.h > @@ -89,7 +89,7 @@ void dm_cell_release(struct dm_bio_prison *prison, > struct bio_list *bios); > void dm_cell_release_no_holder(struct dm_bio_prison *prison, > struct dm_bio_prison_cell *cell, > - struct bio_list *inmates); > + struct bio_list *inmates, int *empty); > void dm_cell_error(struct dm_bio_prison *prison, > struct dm_bio_prison_cell *cell, blk_status_t error); > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c > index fcd8877..51fd396 100644 > --- a/drivers/md/dm-thin.c > +++ b/drivers/md/dm-thin.c > @@ -480,9 +480,9 @@ static void cell_visit_release(struct pool *pool, > > static void cell_release_no_holder(struct pool *pool, > struct dm_bio_prison_cell *cell, > - struct bio_list *bios) > + struct bio_list *bios, int *empty) > { > - dm_cell_release_no_holder(pool->prison, cell, bios); > + dm_cell_release_no_holder(pool->prison, cell, bios, empty); > dm_bio_prison_free_cell(pool->prison, cell); > } > > @@ -886,12 +886,14 @@ static void cell_defer_no_holder(struct thin_c *tc, struct dm_bio_prison_cell *c > { > struct pool *pool = tc->pool; > unsigned long flags; > + int empty; > > spin_lock_irqsave(&tc->lock, flags); > - cell_release_no_holder(pool, cell, &tc->deferred_bio_list); > + cell_release_no_holder(pool, cell, &tc->deferred_bio_list, &empty); > spin_unlock_irqrestore(&tc->lock, flags); > > - wake_worker(pool); > + if (!empty) > + wake_worker(pool); > } Think your point is tc->deferred_bio_list may have bios already, before the call to cell_release_no_holder, so simply checking if it is empty after the cell_release_no_holder call isn't enough? But if tc->deferred_bio_list isn't empty then there is work that needs to be done, regardless of whether this particular cell created work, so I'm left wondering if you first tried simply checking if tc->deferred_bio_list is empty after cell_release_no_holder() in cell_defer_no_holder? Thanks, Mike -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel