Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/06/14 19:41, James Bottomley wrote:
> On Thu, 2014-02-06 at 18:10 +0100, Bart Van Assche wrote:
>> On 02/06/14 17:56, James Bottomley wrote:
>>> Could you benchmark this lot and show what the actual improvement is
>>> just for this series, if any?
>>
>> I see a performance improvement of 12% with the SRP protocol for the
>> SCSI core optimizations alone (I am still busy measuring the impact of
>> the blk-mq conversion but I can already see that it is really
>> significant). Please note that the performance impact depends a lot on
>> the workload (number of LUNs per SCSI host e.g.) so maybe the workload I
>> chose is not doing justice to Christoph's work. And it's also important
>> to mention that with the workload I ran I was saturating the target
>> system CPU (a quad core Intel i5). In other words, results might be
>> better with a more powerful target system.
> 
> On what?  Just the patches I indicated or the whole series?  My specific
> concern is that swapping a critical section for atomics may not buy us
> anything even on x86 and may slow down non-x86.  That's the bit I'd like
> benchmarks to explore.

The numbers I mentioned in my previous e-mail referred to the "SCSI data
path micro-optimizations" patch series and the "A different approach for
using blk-mq in the SCSI layer" series as a whole. I have run a new test
in which I compared the performance between a kernel with these two
patch series applied versus a kernel in which the four patches that
convert host_busy, target_busy and device_busy into atomics have been
reverted. For a workload with a single SCSI host, a single LUN, a block
size of 512 bytes, the SRP protocol and a single CPU thread submitting
I/O requests I see a performance improvement of 0.5% when using atomics.
For a workload with a single SCSI host, eight LUNs and eight CPU threads
submitting I/O I see a performance improvement of 3.8% when using
atomics. Please note that these measurements have been run on a single
socket system, that cache line misses are more expensive on NUMA systems
and hence that the performance impact of these patches on a NUMA system
will be more substantial.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux