Re: [PATCH v16 5/5] x86: vdso: Wire up getrandom() vDSO implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 30, 2024 at 08:38:16PM -0700, Eric Biggers wrote:
> On Tue, May 28, 2024 at 02:19:54PM +0200, Jason A. Donenfeld wrote:
> > diff --git a/arch/x86/entry/vdso/vgetrandom-chacha.S b/arch/x86/entry/vdso/vgetrandom-chacha.S
> > new file mode 100644
> > index 000000000000..d79e2bd97598
> > --- /dev/null
> > +++ b/arch/x86/entry/vdso/vgetrandom-chacha.S
> > @@ -0,0 +1,178 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2022 Jason A. Donenfeld <Jason@xxxxxxxxx>. All Rights Reserved.
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/frame.h>
> > +
> > +.section	.rodata, "a"
> > +.align 16
> > +CONSTANTS:	.octa 0x6b20657479622d323320646e61707865
> > +.text
> > +
> > +/*
> > + * Very basic SSE2 implementation of ChaCha20. Produces a given positive number
> > + * of blocks of output with a nonce of 0, taking an input key and 8-byte
> > + * counter. Importantly does not spill to the stack. Its arguments are:
> > + *
> > + *	rdi: output bytes
> > + *	rsi: 32-byte key input
> > + *	rdx: 8-byte counter input/output
> > + *	rcx: number of 64-byte blocks to write to output
> > + */
> > +SYM_FUNC_START(__arch_chacha20_blocks_nostack)
> > +
> > +.set	output,		%rdi
> > +.set	key,		%rsi
> > +.set	counter,	%rdx
> > +.set	nblocks,	%rcx
> > +.set	i,		%al
> > +/* xmm registers are *not* callee-save. */
> > +.set	state0,		%xmm0
> > +.set	state1,		%xmm1
> > +.set	state2,		%xmm2
> > +.set	state3,		%xmm3
> > +.set	copy0,		%xmm4
> > +.set	copy1,		%xmm5
> > +.set	copy2,		%xmm6
> > +.set	copy3,		%xmm7
> > +.set	temp,		%xmm8
> > +.set	one,		%xmm9
> 
> An "interesting" x86_64 quirk: in SSE instructions, registers xmm0-xmm7 take
> fewer bytes to encode than xmm8-xmm15.
> 
> Since 'temp' is used frequently, moving it into the lower range (and moving one
> of the 'copy' registers, which isn't used as frequently, into the higher range)
> decreases the code size of __arch_chacha20_blocks_nostack() by 5%.

That's a nice trick. Thank you very much for it.

Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux