On Tue, Mar 08, 2022 at 06:02:50PM -0700, Jens Axboe wrote: > On 3/7/22 11:53 AM, Mike Snitzer wrote: > > From: Ming Lei <ming.lei@xxxxxxxxxx> > > > > Support bio(REQ_POLLED) polling in the following approach: > > > > 1) only support io polling on normal READ/WRITE, and other abnormal IOs > > still fallback to IRQ mode, so the target io is exactly inside the dm > > io. > > > > 2) hold one refcnt on io->io_count after submitting this dm bio with > > REQ_POLLED > > > > 3) support dm native bio splitting, any dm io instance associated with > > current bio will be added into one list which head is bio->bi_private > > which will be recovered before ending this bio > > > > 4) implement .poll_bio() callback, call bio_poll() on the single target > > bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call > > dm_io_dec_pending() after the target io is done in .poll_bio() > > > > 5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL, > > which is based on Jeffle's previous patch. > > It's not the prettiest thing in the world with the overlay on bi_private, > but at least it's nicely documented now. > > I would encourage you to actually test this on fast storage, should make > a nice difference. I can run this on a gen2 optane, it's 10x the IOPS > of what it was tested on and should help better highlight where it > makes a difference. > > If either of you would like that, then send me a fool proof recipe for > what should be setup so I have a poll capable dm device. Follows steps for setup dm stripe over two nvmes, then run io_uring on the dm stripe dev. 1) dm_stripe.perl #!/usr/bin/perl -w # Create a striped device across any number of underlying devices. The device # will be called "stripe_dev" and have a chunk-size of 128k. my $chunk_size = 128 * 2; my $dev_name = "stripe_dev"; my $num_devs = @ARGV; my @devs = @ARGV; my ($min_dev_size, $stripe_dev_size, $i); if (!$num_devs) { die("Specify at least one device\n"); } $min_dev_size = `blockdev --getsz $devs[0]`; for ($i = 1; $i < $num_devs; $i++) { my $this_size = `blockdev --getsz $devs[$i]`; $min_dev_size = ($min_dev_size < $this_size) ? $min_dev_size : $this_size; } $stripe_dev_size = $min_dev_size * $num_devs; $stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs); $table = "0 $stripe_dev_size striped $num_devs $chunk_size"; for ($i = 0; $i < $num_devs; $i++) { $table .= " $devs[$i] 0"; } `echo $table | dmsetup create $dev_name`; 2) test_poll_on_dm_stripe.sh #!/bin/bash RT=40 JOBS=1 HI=1 BS=4K set -x dmsetup remove_all rmmod nvme modprobe nvme poll_queues=2 sleep 2 ./dm_stripe.perl /dev/nvme0n1 /dev/nvme1n1 sleep 1 DEV=/dev/mapper/stripe_dev echo "io_uring hipri test" fio --bs=$BS --ioengine=io_uring --fixedbufs --registerfiles \ --hipri=$HI --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 \ --filename=$DEV --direct=1 --runtime=$RT --numjobs=$JOBS --rw=randread --name=test \ --group_reporting Thanks, Ming