Re: [PATCH v4 4/5] powerpc/vdso: Wire up getrandom() vDSO implementation on PPC32

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 02, 2024 at 02:04:41PM +0200, Christophe Leroy wrote:
> This first patch adds support for PPC32. As selftests cannot easily
> be generated only for PPC32, and because the following patch brings
> support for PPC64 anyway, this patch opts out all code in
> __arch_chacha20_blocks_nostack() so that vdso_test_chacha will not
> fail to compile and will not crash on PPC64/PPC64LE, allthough the
> selftest itself will fail. This patch also adds a dummy
> __kernel_getrandom() function that returns ENOSYS on PPC64 so that
> vdso_test_getrandom returns KSFT_SKIP instead of KSFT_FAIL.

Why not just wire up the selftests in the next patch like you did for
v3? This seems like extra stuff for no huge reason?

>  arch/powerpc/Kconfig                         |   1 +
>  arch/powerpc/include/asm/vdso/getrandom.h    |  54 +++++
>  arch/powerpc/include/asm/vdso/vsyscall.h     |   6 +
>  arch/powerpc/include/asm/vdso_datapage.h     |   2 +
>  arch/powerpc/kernel/asm-offsets.c            |   1 +
>  arch/powerpc/kernel/vdso/Makefile            |  13 +-
>  arch/powerpc/kernel/vdso/getrandom.S         |  58 ++++++
>  arch/powerpc/kernel/vdso/vdso32.lds.S        |   1 +
>  arch/powerpc/kernel/vdso/vdso64.lds.S        |   1 +
>  arch/powerpc/kernel/vdso/vgetrandom-chacha.S | 207 +++++++++++++++++++
>  arch/powerpc/kernel/vdso/vgetrandom.c        |  16 ++
>  tools/testing/selftests/vDSO/Makefile        |   2 +-
>  12 files changed, 359 insertions(+), 3 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/vdso/getrandom.h
>  create mode 100644 arch/powerpc/kernel/vdso/getrandom.S
>  create mode 100644 arch/powerpc/kernel/vdso/vgetrandom-chacha.S
>  create mode 100644 arch/powerpc/kernel/vdso/vgetrandom.c

I think you might have forgotten to add the symlink in this commit (or
the next one, per my comment above, if you agree with it).

> +/*
> + * Very basic 32 bits implementation of ChaCha20. Produces a given positive number
> + * of blocks of output with a nonce of 0, taking an input key and 8-byte
> + * counter. Importantly does not spill to the stack. Its arguments are:
> + *
> + *	r3: output bytes
> + *	r4: 32-byte key input
> + *	r5: 8-byte counter input/output (saved on stack)
> + *	r6: number of 64-byte blocks to write to output
> + *
> + *	r0: counter of blocks (initialised with r6)
> + *	r4: Value '4' after key has been read.
> + *	r5-r12: key
> + *	r14-r15: counter
> + *	r16-r31: state
> + */
> +SYM_FUNC_START(__arch_chacha20_blocks_nostack)
> +#ifdef __powerpc64__
> +	blr
> +#else
> +	stwu	r1, -96(r1)
> +	stw	r5, 20(r1)
> +	stmw	r14, 24(r1)
> +
> +	lwz	r14, 0(r5)
> +	lwz	r15, 4(r5)
> +	mr	r0, r6
> +	subi	r3, r3, 4
> +
> +	lwz	r5, 0(r4)
> +	lwz	r6, 4(r4)
> +	lwz	r7, 8(r4)
> +	lwz	r8, 12(r4)
> +	lwz	r9, 16(r4)
> +	lwz	r10, 20(r4)
> +	lwz	r11, 24(r4)
> +	lwz	r12, 28(r4)

If you don't want to do this, don't worry about it, but while I'm
commenting on things, I think it's worth noting that x86, loongarch, and
arm64 implementations all use the preprocessor or macros to give names
to these registers -- state1,2,3,...copy1,2,3 and so forth. Might be
worth doing the same if you think there's an easy and obvious way of
doing it. If not -- or if that kind of work abhors you -- don't worry
about it, as I'm confident enough that this code works fine. But it
might be "nice to have". Up to you.

> +
> +	li	r4, 4
> +.Lblock:
> +	li	r31, 10
> +

Maybe a comment here, "expand 32-byte k" or similar.

> +	lis	r16, 0x6170
> +	lis	r17, 0x3320
> +	lis	r18, 0x7962
> +	lis	r19, 0x6b20
> +	addi	r16, r16, 0x7865
> +	addi	r17, r17, 0x646e
> +	addi	r18, r18, 0x2d32
> +	addi	r19, r19, 0x6574
> +
> +	mtctr	r31
> +
> +	mr	r20, r5
> +	mr	r21, r6
> +	mr	r22, r7
> +	mr	r23, r8
> +	mr	r24, r9
> +	mr	r25, r10
> +	mr	r26, r11
> +	mr	r27, r12
> +
> +	mr	r28, r14
> +	mr	r29, r15
> +	li	r30, 0
> +	li	r31, 0
> +
> +.Lpermute:
> +	QUARTERROUND4( 0, 4, 8,12, 1, 5, 9,13, 2, 6,10,14, 3, 7,11,15)
> +	QUARTERROUND4( 0, 5,10,15, 1, 6,11,12, 2, 7, 8,13, 3, 4, 9,14)
> +
> +	bdnz	.Lpermute
> +
> +	addis	r16, r16, 0x6170
> +	addis	r17, r17, 0x3320
> +	addis	r18, r18, 0x7962
> +	addis	r19, r19, 0x6b20
> +	addi	r16, r16, 0x7865
> +	addi	r17, r17, 0x646e
> +	addi	r18, r18, 0x2d32
> +	addi	r19, r19, 0x6574
> +
> +	add	r20, r20, r5
> +	add	r21, r21, r6
> +	add	r22, r22, r7
> +	add	r23, r23, r8
> +	add	r24, r24, r9
> +	add	r25, r25, r10
> +	add	r26, r26, r11
> +	add	r27, r27, r12
> +
> +	add	r28, r28, r14
> +	add	r29, r29, r15
> +
> +	stwbrx	r16, r4, r3
> +	addi	r3, r3, 8
> +	stwbrx	r17, 0, r3
> +	stwbrx	r18, r4, r3
> +	addi	r3, r3, 8
> +	stwbrx	r19, 0, r3
> +	stwbrx	r20, r4, r3
> +	addi	r3, r3, 8
> +	stwbrx	r21, 0, r3
> +	stwbrx	r22, r4, r3
> +	addi	r3, r3, 8
> +	stwbrx	r23, 0, r3
> +	stwbrx	r24, r4, r3
> +	addi	r3, r3, 8
> +	stwbrx	r25, 0, r3
> +	stwbrx	r26, r4, r3
> +	addi	r3, r3, 8
> +	stwbrx	r27, 0, r3
> +	stwbrx	r28, r4, r3
> +	addi	r3, r3, 8
> +	stwbrx	r29, 0, r3
> +	stwbrx	r30, r4, r3
> +	addi	r3, r3, 8
> +	stwbrx	r31, 0, r3
> +
> +	subic.	r0, r0, 1	/* subi. can't use r0 as source */

Never seen the period suffix. Just looked this up. Neat.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux