Re: IMX8MM PCIe performance evaluated with NVMe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 15, 2021 at 08:26:37AM -0800, Tim Harvey wrote:
> On Fri, Dec 3, 2021 at 3:31 PM Keith Busch <kbusch@xxxxxxxxxx> wrote:
> > On Fri, Dec 03, 2021 at 01:52:17PM -0800, Tim Harvey wrote:
> > > What would a more appropriate way of testing PCIe performance be?
> >
> > Beyond the protocol overhead, 'dd' is probably not going to be the best
> > way to meausre a device's performance. This sends just one command at a
> > time, so you are also measuring the full software stack latency, which
> > includes a system call and interrupt driven context switches. The PCIe
> > traffic would be idle during this overhead when running at just qd1.
> >
> > I am guessing your x86 is simply faster at executing through this
> > software stack than your imx8mm, so the software latency is lower.
> >
> > A better approach may be to use higher queue depths with batched
> > submissions so that your software overhead can occur concurrently with
> > your PCIe traffic. Also, you can eliminate interrupt context switches if
> > you use polled IO queues.
> 
> Thanks for the response!
> 
> The roughly 266MB/s performance results I've got on IMX8MM gen2 x1
> using NVMe and plain old 'dd' is on par with what another has found
> using a custom PCIe device of theirs and a simple loopback test so I
> feel that the 'software stack' isn't the bottleneck here (as that's
> removed in his situation). I'm leaning towards something like
> interrupt latency. I'll have to dig into the NVMe device driver and
> see if there is a way to hack it to poll to see what the difference
> is.

You don't need to hack anything, the driver already supports polling.
You just need to enable the poll queues (they're off by default). For
example, you can turn on 2 polled queues with kernel parameter:

  nvme.poll_queues=2

After booting with that, you just need to submit IO with the HIPRI flag.
The 'dd' command can't do that, so I think you'll need to use 'fio'. An
example command that will run the same workload as your 'dd' example,
but with polling:

  fio --name=global --filename=/dev/nvme1n1 --rw=read --ioengine=pvsync2 --bs=1M --direct=1 --hipri --name=test

To verify that polling is actually happening, the fio output for "cpu"
stats should show something like "sys=99%".



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux