Storage target performance

Bart Van Assche <bvanassche@xxxxxxx> · Sun, 13 Feb 2011 20:26:16 +0100

Hi,

Out of curiosity I have been running a few performance tests with the
SCST and TCM versions of the ib_srpt driver and a RAM disk (tmpfs) as
storage target. The results are as follows:

(1) fio --bs=4K --ioengine=psync --buffered=0 --rw=read
24.014 IOPS with the SCST version and 22.028 IOPS with the TCM version
or a difference of about 9%. This confirms that the TCM core needs
more time for processing a single command.

(2) fio --bs=4K --ioengine=libaio --iodepth=16 --buffered=0 --rw=read
--thread --numjobs=2 --gtod_reduce=1 --group_reporting --loops=10
184.000 IOPS with the SCST version and 167.000 IOPS with the TCM
version or a difference of about 10%.

(3) fio --bs=64M --ioengine=psync --buffered=0 --rw=read
1584 MB/s with the SCST version and 1262 MB/s with the TCM version, or
SCST reaching a 25% higher bandwidth (where SCST was using a single
I/O thread).

(4) fio --bs=64M --ioengine=psync --buffered=0 --rw=write
1409 MB/s with the SCST version and 992 MB/s with the TCM version, or
SCST reaching a 42% higher bandwidth (where SCST was using a single
I/O thread).

Notes:
- At least for the SRP protocol the initiator system is the
bottleneck. This means that the ratio of the TCM to SCST processing
overhead in the TCM core is probably larger than the IOPS ratios for
the above tests. Tests have shown that about four initiator systems
are necessary to saturate one system running ib_srpt.
- The above results should be representative for a setup with
low-latency storage (SSD). For a setup where data is stored on a RAID
array, random I/O performance depends a lot on the number of threads
used by the storage target for processing I/O. SCST, TGT and IET all
use multiple threads for processing I/O while TCM has been designed to
use a single I/O thread per target device.
- For the above tests the results did not depend on the number of
threads configured in SCST. This is consistent with what has been
observed for SCST setups with high-end SSDs - optimal IOPS numbers
with a small number of I/O threads (one or two).
- One performance optimization has been applied in the TCM version
that has not yet been applied in the SCST version: embedding the
target command data structure in another structure such that it does
not have to be allocated and deallocated each time a SCSI command is
processed.

Setup details:
- Kernel version: 2.6.38-rc4.
- Default kernel module parameter values were used for ib_srp and
ib_srpt. This means that the ib_srp parameter srp_sg_tablesize had the
value 12 in this test.
- IB hardware: QDR HCA in a PCIe slot.
- CPU model: Intel Core 2 Duo.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html