Performance and NCQ observations with sata_mv on 4.1.2 kernel

Dan Walkes <danwalkes@xxxxxxxxxxxxxxxxx> · Tue, 5 Jun 2018 19:56:01 -0600

Hi,

I’m trying to isolate a performance issue I’ve noticed on a Supermicro
A2SAV [1] motherboard, have noticed a difference in behavior on
Marvell vs Intel SATA ports and wondering if anyone on this list has
any suggestions.

I’m using a 4.1.2 kernel and out of kernel build of IET iSCSI
Enterprise target (for legacy reasons) to export a SATA connected HDD
(TOSHIBA MQ01ABD1).

On first attempt to bring up this hardware I noticed read performance
on benchmark tests (Iometer 1M sequential read workload) was about
70MB/s, or approximately 30-40% slower than what we see on a similar
hardware platform.  I noticed this slow performance only when
connected to the Marvell 88SE9230 ports on the motherboard, when I
used the Intel ports (Atom E3940 SoC) the problem disappeared.

I noticed if I turned off NCQ on the HDD when connected to the Marvell
ports (echo 1 > /sys/class/block/sdb/device/queue_depth) the
performance goes back to what I would expect, around 108 MB/s.

I’ve captured SATA traces of the activity on the HDD using a SATA
analyzer and noticed a difference in pending IO as well as IO
throughput, latency and response time on the 88SE9230 when compared to
the Intel port.  Here are relevant screenshots:

Intel port with iSCSI reads: https://photos.app.goo.gl/MIcZpraFvBw9DazG3
Marvell port with iSCSI reads: https://photos.app.goo.gl/IbPgLZyuNkXpROWQ2

As you can see, the Intel SATA port config is able to keep the pending
IO queue depth near 7 for the duration of the transfer, however the
Marvell port bounces around between 1 and 7, and io latency/response
times are similarly variable and overall longer than with the Intel
SATA config.

I can share the full traces as well if anyone is interested.  I don’t
see obvious differences in terms of sequential data access or size of
transfers (all are 256KB).

I’ve tried to reproduce without the iSCSI target and have not yet been
able to do so.  I was able to demonstrate similar queue behavior,
however, using aio-stress [2] and commands sudo ./aio-stress -c 1 -t 1
-O -o 1 -r 16K /dev/sdb.  See the pending IO depth, throughput and
latencies in this case at https://photos.app.goo.gl/HEcgVftVR7gjASc78
.  Overall read performance matched the Intel chipset performance at
~108 MB/s however.

I haven’t tried moving to the latest kernel, and I think it would take
some effort to get the IET target to run on a latest kernel, however I
could compare aio-stress queue depth with most recent kernels if that
might be interesting.

My main questions are whether the pending IO and queue depth behavior
difference observation on Marvell vs Intel is expected and if there
are any things other than disabling NCQ I can try to improve
performance in this scenario on the Marvell 88SE9230 ports.

Thanks for reading and for any suggestions!

Dan

[1] https://www.supermicro.com/products/motherboard/Atom/A2SAV.cfm
[2] https://www.vi4io.org/tools/benchmarks/aio-stress
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html