Hi Suresh, On Wed, 2015-05-13 at 18:18 -0700, Suresh Rajagopalan wrote: > I should add this is over 10G ethernet. > A few data-points wrt traditional iSCSI performance tuning: You'll want to use Intel's set_irq_affinity.sh to set explicit CPU affinity for 10 Gb/sec NIC's MSI-X interrupt vectors, and disable irq-balance too. Using >= 4096 byte MTUs on your 10 Gb/sec network + NICs will help reduce CPU utilization for heavy small block random I/O workloads > > > On Wed, May 13, 2015 at 6:03 PM, Suresh Rajagopalan <sraja97@xxxxxxxxx> wrote: > > I have a Intel DC3700 800G Pcieflash which I'm trying to benchmark > > with LIO. I see a discrepancy in fio benchmarks on this devices when > > running directly (/dev/nvme0n1) and when running via iscsi. > > > > The following fio file produces about 400k iops on 100% read. A > > identical fio file run over a ISCSI device to LIO where the backstore > > is the same flash /dev/nvme0n1 produces about 40k iops. > > > > Is there something I am missing? The queue depth is 32, but the > > numbers do not change even when bumped upto 128. > > > > Thanks for the help > > Suresh > > > > > > ---------------------------- > > > > [global] > > description=Emulation of Intel IOmeter File Server Access Pattern > > filename=/dev/nvme0n1 > > [iometer] > > bssplit=4k/100 > > rw=randread > > direct=1 > > size=18000m > > ioengine=libaio > > > > iodepth_batch=4 > > iodepth_batch_complete=32 > > numjobs=64 > -- For pre scsi-mq Linux/iSCSI initiator hosts, set the I/O scheduler to noop using: echo noop > /sys/block/$SCSI_BD/queue/scheduler Having multiple iSCSI LUN exports per struct Scsi_Host (eg: target endpoint) has an effect on small block performance, especially for < v3.18 (pre scsi-mq) initiator Linux hosts that acquire+release request_queue and Scsi_Host locks for every queued I/O. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html