Re: RAID5 XOR speed vs RAID6 Q speed (was Re: AVX RAID5 xor checksumming)

Dan Williams <dan.j.williams@xxxxxxxxx> · Fri, 6 Apr 2012 13:43:11 -0700

[adding Boaz since he also made an attempt at fixing this]

http://marc.info/?l=linux-crypto-vger&m=131829241111450&w=2

...I had meant to follow up on this, but was buried in 'isci' issues.

On Tue, Apr 3, 2012 at 4:56 PM, Jim Kukunas
<james.t.kukunas@xxxxxxxxxxxxxxx> wrote:
> On Tue, Apr 03, 2012 at 11:23:16AM +0100, John Robinson wrote:
>> On 02/04/2012 23:48, Jim Kukunas wrote:
>> > On Sat, Mar 31, 2012 at 12:38:56PM +0100, John Robinson wrote:
>> [...]
>> >> I just noticed in my logs the other day (recent el5 kernel on a Core 2):
>> >>
>> >> raid5: automatically using best checksumming function: generic_sse
>> >>      generic_sse:  7805.000 MB/sec
>> >> raid5: using function: generic_sse (7805.000 MB/sec)
>> [...]
>> >> raid6: using algorithm sse2x4 (8237 MB/s)
>> >>
>> >> I was just wondering how it's possible to do the RAID6 Q calculation
>> >> faster than the RAID5 XOR calculation - or am I reading this log excerpt
>> >> wrongly?
>> >
>> > Out of curiosity, are you running with CONFIG_PREEMPT=y?
>>
>> No. Here's an excerpt from my .config:
>>
>> # CONFIG_PREEMPT_NONE is not set
>> CONFIG_PREEMPT_VOLUNTARY=y
>> # CONFIG_PREEMPT is not set
>> CONFIG_PREEMPT_BKL=y
>> CONFIG_PREEMPT_NOTIFIERS=y
>>
>> But this is a Xen dom0 kernel, 2.6.18-308.1.1.el5.centos.plusxen. Now, a
>> non-Xen kernel (2.6.18-308.1.1.el5) says:
>> raid5: automatically using best checksumming function: generic_sse
>>     generic_sse: 11892.000 MB/sec
>> raid5: using function: generic_sse (11892.000 MB/sec)
>> raid6: int64x1   2644 MB/s
>> raid6: int64x2   3238 MB/s
>> raid6: int64x4   3011 MB/s
>> raid6: int64x8   2503 MB/s
>> raid6: sse2x1    5375 MB/s
>> raid6: sse2x2    5851 MB/s
>> raid6: sse2x4    9136 MB/s
>> raid6: using algorithm sse2x4 (9136 MB/s)
>>
>> Looks like it loses a chunk of performance running as a Xen dom0.
>>
>> Even still, 11892 MB/s for XOR vs 9136 MB/s for XOR+Q - it still seems
>> remarkable that the XOR can't be done several times faster than the Q.
>
> Taking a look at do_xor_speed, I see two issues which might be the cause
> of the disparity you reported.
>
> 0) In the RAID5 xor benchmark, we get the current jiffy, then run do_2() until
> the jiffy increments. This means we could potentially be testing for less
> than a full jiffy. The RAID6 benchmark handles this by obtaining the current
> jiffy, then calling cpu_relax() until the jiffy increments, and then running
> the test. This is addressed by my first patch.
>
> 1) The only way I could reproduce your findings of a higher throughput for
> RAID6 than for RAID5 xor checksumming was with CONFIG_PREEMPT=y. It seems
> that you encountered this while running as XEN dom0. Currently, we disable
> preemption during the RAID6 benchmark, but don't in the RAID5 benchmark.
> This is addressed by my second patch.
>
> I've added linux-crypto to the discussion as both of these patches affect
> code in crypto/
>
> Thanks.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html