On Thu, Dec 13, 2012 at 12:40:27PM +0100, Bart Van Assche wrote: > On 12/11/12 23:46, scameron@xxxxxxxxxxxxxxxxxx wrote: > >I would be curious to see what kind of results you would get with > >scsi_debug > >with fake_rw=1. I am sort of suspecting that trying to put an "upper > >limit" > >on scsi LLD IOPS performance by seeing what scsi_debug will do with > >fake_rw=1 > >is not really valid (or, maybe I'm doing it wrong) as I know of one case in > >which a real HW scsi driver beats scsi_debug fake_rw=1 at IOPS on the very > >same system, which seems like it shouldn't be possible. Kind of > >mysterious. > > The test > > # disable-frequency-scaling > # modprobe scsi_debug delay=0 fake_rw=1 > # echo 2 > /sys/block/sdc/queue/rq_affinity > # echo noop > /sys/block/sdc/queue/scheduler > # echo 0 > /sys/block/sdc/queue/add_random > > results in about 800K IOPS for random reads on the same setup (with a > request size of 4 KB; CPU: quad core i5-2400). > > Repeating the same test with fake_rw=0 results in about 651K IOPS. What are your system specs? Here's what I'm seeing. I have one 6-core processor. [root@localhost scameron]# grep 'model name' /proc/cpuinfo model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz hyperthreading is disabled. Here is the script I'm running. [root@localhost scameron]# cat do-dds #!/bin/sh do_dd() { device="$1" cpu="$2" taskset -c "$cpu" dd if="$device" of=/dev/null bs=4k iflag=direct } do_six() { for x in `seq 0 5` do do_dd "$1" $x & done } do_120() { for z in `seq 1 20` do do_six "$1" done wait } time do_120 "$1" I don't have "disable-frequency-scaling" on rhel6, but I think if I send SIGUSR1 to all the cpuspeed processes, this does the same thing. ps aux | grep cpuspeed | grep -v grep | awk '{ printf("kill -USR1 %s\n", $2);}' | sh [root@localhost scameron]# find /sys -name 'scaling_cur_freq' -print | xargs cat 2000000 2000000 2000000 2000000 2000000 2000000 [root@localhost scameron]# Now, using scsi-debug (300mb size) with delay=0 and fake_rw=1, with rq_affinity set to 2, and add_random set to 0 and noop i/o scheduler I get ~216k iops. With my scsi lld (actually doing the i/o) , I now get ~190k iops. rq_affinity set to 2, add_random 0, noop i/o scheduler, irqs manually spread across cpus (irqbalance turned off). With my block lld (actually doing the i/o), I get ~380k iops. rq_affinity set to 2, add_random 0, i/o scheduler "none" (there is no i/o scheduler with the make_request interface), irqs manually spread across cpus (irqbalance turned off). So the block driver seems to beat the snot out of the scsi lld by a factor of 2x now, rather than 3x, so I guess that's some improvement, but still. -- steve -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html