On 6 December 2017 at 12:12, Dave P Martin <Dave.Martin@xxxxxxx> wrote: > On Wed, Dec 06, 2017 at 11:57:12AM +0000, Ard Biesheuvel wrote: >> On 6 December 2017 at 11:51, Dave Martin <Dave.Martin@xxxxxxx> wrote: >> > On Tue, Dec 05, 2017 at 06:04:34PM +0000, Ard Biesheuvel wrote: >> >> On 5 December 2017 at 12:45, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: >> >> > >> >> > >> >> >> On 5 Dec 2017, at 12:28, Dave Martin <Dave.Martin@xxxxxxx> wrote: >> >> >> >> >> >>> On Mon, Dec 04, 2017 at 12:26:37PM +0000, Ard Biesheuvel wrote: >> >> >>> Add a support macro to conditionally yield the NEON (and thus the CPU) >> >> >>> that may be called from the assembler code. Given that especially the >> >> >>> instruction based accelerated crypto code may use very tight loops, add >> >> >>> some parametrization so that the TIF_NEED_RESCHED flag test is only >> >> >>> executed every so many loop iterations. >> >> >>> >> >> >>> In some cases, yielding the NEON involves saving and restoring a non >> >> >>> trivial amount of context (especially in the CRC folding algorithms), >> >> >>> and so the macro is split into two, and the code in between is only >> >> >>> executed when the yield path is taken, allowing the contex to be preserved. >> >> >>> The second macro takes a label argument that marks the resume-from-yield >> >> >>> path, which should restore the preserved context again. >> >> >>> >> >> >>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> >> >> >>> --- >> >> >>> arch/arm64/include/asm/assembler.h | 50 ++++++++++++++++++++ >> >> >>> 1 file changed, 50 insertions(+) >> >> >>> >> >> >>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h >> >> >>> index aef72d886677..917b026d3e00 100644 >> >> >>> --- a/arch/arm64/include/asm/assembler.h >> >> >>> +++ b/arch/arm64/include/asm/assembler.h >> >> >>> @@ -512,4 +512,54 @@ alternative_else_nop_endif >> >> >>> #endif >> >> >>> .endm >> >> >>> >> >> >>> +/* >> >> >>> + * yield_neon - check whether to yield to another runnable task from >> >> >>> + * kernel mode NEON code (running with preemption disabled) >> >> >>> + * >> >> >>> + * - Check whether the preempt count is exactly 1, in which case disabling >> >> >>> + * preemption once will make the task preemptible. If this is not the case, >> >> >>> + * yielding is pointless. >> >> >>> + * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable >> >> >>> + * kernel mode NEON (which will trigger a reschedule), and branch to the >> >> >>> + * yield fixup code at @lbl. >> >> >>> + */ >> >> >>> + .macro yield_neon, lbl:req, ctr, order, stride, loop >> >> >>> + yield_neon_pre \ctr, \order, \stride, \loop >> >> >>> + yield_neon_post \lbl >> >> >>> + .endm >> >> >>> + >> >> >>> + .macro yield_neon_pre, ctr, order=0, stride, loop=4444f >> >> >>> +#ifdef CONFIG_PREEMPT >> >> >>> + /* >> >> >>> + * With some algorithms, it makes little sense to poll the >> >> >>> + * TIF_NEED_RESCHED flag after every iteration, so only perform >> >> >>> + * the check every 2^order strides. >> >> >>> + */ >> >> >>> + .if \order > 1 >> >> >>> + .if (\stride & (\stride - 1)) != 0 >> >> >>> + .error "stride should be a power of 2" >> >> >>> + .endif >> >> >>> + tst \ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1) >> >> >>> + b.ne \loop >> >> >>> + .endif >> >> >> >> >> >> I'm not sure what baking in this check gives us, and this seems to >> >> >> conflate two rather different things: yielding and defining a >> >> >> "heartbeat" frequency for the calling code. >> >> >> >> >> >> Can we separate out the crypto-loop-helper semantics from the yield- >> >> >> neon stuff? >> >> >> >> >> > >> >> > Fair enough. I incorporated the check here so it disappears from the code entirely when !CONFIG_PREEMPT, because otherwise, you end up with a sequence that is mispredicted every # iterations without any benefit. >> >> > I guess i could macroise that separately though. >> >> > >> >> >> If we had >> >> >> >> >> >> if_cond_yield_neon >> >> >> // patchup code >> >> >> endif_yield_neon >> >> >> >> >> >> >> I like this, but ... >> >> >> >> >> then the caller is free to conditionally branch over that as appropriate >> >> >> like >> >> >> >> >> >> loop: >> >> >> // crypto stuff >> >> >> tst x0, #0xf >> >> >> b.ne loop >> >> >> >> >> >> if_cond_yield_neon >> >> >> // patchup code >> >> >> endif_cond_yield_neon >> >> >> >> >> >> >> I need to put the post patchup code somewhere too. Please refer to the >> >> CRC patches for the best examples of this. >> > >> > I'm not sure I follow. If you need to do something different after >> > patchup, can't you either pull that code into the if...endif too, or >> > otherwise put a branch before the endif? >> > >> >> No, not really. >> >> What I currently have is >> >> > if_cond_yield_neon >> > // patchup code >> > endif_cond_yield_neon >> >> which is being turned into >> >> get_thread_info x0 >> ldr w1, [x0, #TSK_TI_PREEMPT] >> ldr x0, [x0, #TSK_TI_FLAGS] >> cmp w1, #1 // == PREEMPT_OFFSET >> csel x0, x0, xzr, eq >> tbnz x0, #TIF_NEED_RESCHED, 5555f // needs rescheduling? >> 4444: >> >> .subsection 1 >> 5555: >> // patchup code >> bl kernel_neon_end >> bl kernel_neon_begin >> // post patchup code goes here >> b 4444b >> .subsection >> >> so what I basically need is a place to put the post patchup code, >> unless I open code it and branch to a different label right after >> kernel_neon_begin > > Ah, I get you. I had convinced myself that there was only post- > patchup code. > > So you conceptually want something like > > if_will_cond_yield_neon > // pre-yield patchup code > do_cond_yield_neon > // post-yield patchup code > endif_yield_neon > > .macro if_will_cond_yield_neon > // do preempt_count == 1 && TIF_NEED_RESCHED check > b.wont_yield 3432f > .endm > > .macro do_cond_yield_neon > bl kernel_neon_end > bl kernel_neon_begin > .endm > > .macro endif_yield_neon > 3432: > .endm > > The main difference I see is that this doesn't get the slow-case > code out of line. > > Or have I still missed something? > No, that seems accurate. I think the idiom looks much better, but I'd still like to get the code out of line, so that everything disappears completely for !CONFIG_PREEMPT. I'll try to rework my stuff using this as a starting point.