Re: [PATCH v2 11/19] arm64: assembler: add macro to conditionally yield the NEON under PREEMPT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6 December 2017 at 12:12, Dave P Martin <Dave.Martin@xxxxxxx> wrote:
> On Wed, Dec 06, 2017 at 11:57:12AM +0000, Ard Biesheuvel wrote:
>> On 6 December 2017 at 11:51, Dave Martin <Dave.Martin@xxxxxxx> wrote:
>> > On Tue, Dec 05, 2017 at 06:04:34PM +0000, Ard Biesheuvel wrote:
>> >> On 5 December 2017 at 12:45, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
>> >> >
>> >> >
>> >> >> On 5 Dec 2017, at 12:28, Dave Martin <Dave.Martin@xxxxxxx> wrote:
>> >> >>
>> >> >>> On Mon, Dec 04, 2017 at 12:26:37PM +0000, Ard Biesheuvel wrote:
>> >> >>> Add a support macro to conditionally yield the NEON (and thus the CPU)
>> >> >>> that may be called from the assembler code. Given that especially the
>> >> >>> instruction based accelerated crypto code may use very tight loops, add
>> >> >>> some parametrization so that the TIF_NEED_RESCHED flag test is only
>> >> >>> executed every so many loop iterations.
>> >> >>>
>> >> >>> In some cases, yielding the NEON involves saving and restoring a non
>> >> >>> trivial amount of context (especially in the CRC folding algorithms),
>> >> >>> and so the macro is split into two, and the code in between is only
>> >> >>> executed when the yield path is taken, allowing the contex to be preserved.
>> >> >>> The second macro takes a label argument that marks the resume-from-yield
>> >> >>> path, which should restore the preserved context again.
>> >> >>>
>> >> >>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
>> >> >>> ---
>> >> >>> arch/arm64/include/asm/assembler.h | 50 ++++++++++++++++++++
>> >> >>> 1 file changed, 50 insertions(+)
>> >> >>>
>> >> >>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> >> >>> index aef72d886677..917b026d3e00 100644
>> >> >>> --- a/arch/arm64/include/asm/assembler.h
>> >> >>> +++ b/arch/arm64/include/asm/assembler.h
>> >> >>> @@ -512,4 +512,54 @@ alternative_else_nop_endif
>> >> >>> #endif
>> >> >>>    .endm
>> >> >>>
>> >> >>> +/*
>> >> >>> + * yield_neon - check whether to yield to another runnable task from
>> >> >>> + *        kernel mode NEON code (running with preemption disabled)
>> >> >>> + *
>> >> >>> + * - Check whether the preempt count is exactly 1, in which case disabling
>> >> >>> + *   preemption once will make the task preemptible. If this is not the case,
>> >> >>> + *   yielding is pointless.
>> >> >>> + * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable
>> >> >>> + *   kernel mode NEON (which will trigger a reschedule), and branch to the
>> >> >>> + *   yield fixup code at @lbl.
>> >> >>> + */
>> >> >>> +    .macro        yield_neon, lbl:req, ctr, order, stride, loop
>> >> >>> +    yield_neon_pre    \ctr, \order, \stride, \loop
>> >> >>> +    yield_neon_post    \lbl
>> >> >>> +    .endm
>> >> >>> +
>> >> >>> +    .macro        yield_neon_pre, ctr, order=0, stride, loop=4444f
>> >> >>> +#ifdef CONFIG_PREEMPT
>> >> >>> +    /*
>> >> >>> +     * With some algorithms, it makes little sense to poll the
>> >> >>> +     * TIF_NEED_RESCHED flag after every iteration, so only perform
>> >> >>> +     * the check every 2^order strides.
>> >> >>> +     */
>> >> >>> +    .if        \order > 1
>> >> >>> +    .if        (\stride & (\stride - 1)) != 0
>> >> >>> +    .error        "stride should be a power of 2"
>> >> >>> +    .endif
>> >> >>> +    tst        \ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1)
>> >> >>> +    b.ne        \loop
>> >> >>> +    .endif
>> >> >>
>> >> >> I'm not sure what baking in this check gives us, and this seems to
>> >> >> conflate two rather different things: yielding and defining a
>> >> >> "heartbeat" frequency for the calling code.
>> >> >>
>> >> >> Can we separate out the crypto-loop-helper semantics from the yield-
>> >> >> neon stuff?
>> >> >>
>> >> >
>> >> > Fair enough. I incorporated the check here so it disappears from the code entirely when !CONFIG_PREEMPT, because otherwise, you end up with a sequence that is mispredicted every # iterations without any benefit.
>> >> > I guess i could macroise that separately though.
>> >> >
>> >> >> If we had
>> >> >>
>> >> >>    if_cond_yield_neon
>> >> >>    // patchup code
>> >> >>    endif_yield_neon
>> >> >>
>> >>
>> >> I like this, but ...
>> >>
>> >> >> then the caller is free to conditionally branch over that as appropriate
>> >> >> like
>> >> >>
>> >> >> loop:
>> >> >>    // crypto stuff
>> >> >>    tst x0, #0xf
>> >> >>    b.ne    loop
>> >> >>
>> >> >>    if_cond_yield_neon
>> >> >>    // patchup code
>> >> >>    endif_cond_yield_neon
>> >> >>
>> >>
>> >> I need to put the post patchup code somewhere too. Please refer to the
>> >> CRC patches for the best examples of this.
>> >
>> > I'm not sure I follow.  If you need to do something different after
>> > patchup, can't you either pull that code into the if...endif too, or
>> > otherwise put a branch before the endif?
>> >
>>
>> No, not really.
>>
>> What I currently have is
>>
>> >    if_cond_yield_neon
>> >    // patchup code
>> >    endif_cond_yield_neon
>>
>> which is being turned into
>>
>>     get_thread_info x0
>>     ldr w1, [x0, #TSK_TI_PREEMPT]
>>     ldr x0, [x0, #TSK_TI_FLAGS]
>>     cmp w1, #1 // == PREEMPT_OFFSET
>>     csel x0, x0, xzr, eq
>>     tbnz x0, #TIF_NEED_RESCHED, 5555f // needs rescheduling?
>> 4444:
>>
>>     .subsection 1
>> 5555:
>>     // patchup code
>>     bl kernel_neon_end
>>     bl kernel_neon_begin
>>     // post patchup code goes here
>>     b 4444b
>>     .subsection
>>
>> so what I basically need is a place to put the post patchup code,
>> unless I open code it and branch to a different label right after
>> kernel_neon_begin
>
> Ah, I get you.  I had convinced myself that there was only post-
> patchup code.
>
> So you conceptually want something like
>
> if_will_cond_yield_neon
>         // pre-yield patchup code
> do_cond_yield_neon
>         // post-yield patchup code
> endif_yield_neon
>
> .macro if_will_cond_yield_neon
>         // do preempt_count == 1 && TIF_NEED_RESCHED check
>         b.wont_yield 3432f
> .endm
>
> .macro do_cond_yield_neon
>         bl      kernel_neon_end
>         bl      kernel_neon_begin
> .endm
>
> .macro endif_yield_neon
> 3432:
> .endm
>
> The main difference I see is that this doesn't get the slow-case
> code out of line.
>
> Or have I still missed something?
>

No, that seems accurate. I think the idiom looks much better, but I'd
still like to get the code out of line, so that everything disappears
completely for !CONFIG_PREEMPT.

I'll try to rework my stuff using this as a starting point.



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux