On Fri, 4 Oct 2019 at 16:53, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > > > > > On Oct 4, 2019, at 6:52 AM, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > > > > On Fri, 4 Oct 2019 at 15:42, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > >> > >>> On Thu, Oct 03, 2019 at 10:43:29AM +0200, Ard Biesheuvel wrote: > >>> On Wed, 2 Oct 2019 at 16:17, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > >>>> > >>> ... > >>>> > >>>> In the future, I would like to extend these interfaces to use static calls, > >>>> so that the accelerated implementations can be [un]plugged at runtime. For > >>>> the time being, we rely on weak aliases and conditional exports so that the > >>>> users of the library interfaces link directly to the accelerated versions, > >>>> but without the ability to unplug them. > >>>> > >>> > >>> As it turns out, we don't actually need static calls for this. > >>> Instead, we can simply permit weak symbol references to go unresolved > >>> between modules (as we already do in the kernel itself, due to the > >>> fact that ELF permits it), and have the accelerated code live in > >>> separate modules that may not be loadable on certain systems, or be > >>> blacklisted by the user. > >> > >> You're saying that at module insertion time, the kernel will override > >> weak symbols with those provided by the module itself? At runtime? > >> > > > > Yes. > > > >> Do you know offhand how this patching works? Is there a PLT that gets > >> patched, and so the calls all go through a layer of function pointer > >> indirection? Or are all call sites fixed up at insertion time and the > >> call instructions rewritten with some runtime patching magic? > >> > > > > No magic. Take curve25519 for example, when built for ARM: > > > > 00000000 <curve25519>: > > 0: f240 0300 movw r3, #0 > > 0: R_ARM_THM_MOVW_ABS_NC curve25519_arch > > 4: f2c0 0300 movt r3, #0 > > 4: R_ARM_THM_MOVT_ABS curve25519_arch > > 8: b570 push {r4, r5, r6, lr} > > a: 4604 mov r4, r0 > > c: 460d mov r5, r1 > > e: 4616 mov r6, r2 > > 10: b173 cbz r3, 30 <curve25519+0x30> > > 12: f7ff fffe bl 0 <curve25519_arch> > > 12: R_ARM_THM_CALL curve25519_arch > > 16: b158 cbz r0, 30 <curve25519+0x30> > > 18: 4620 mov r0, r4 > > 1a: 2220 movs r2, #32 > > 1c: f240 0100 movw r1, #0 > > 1c: R_ARM_THM_MOVW_ABS_NC .LANCHOR0 > > 20: f2c0 0100 movt r1, #0 > > 20: R_ARM_THM_MOVT_ABS .LANCHOR0 > > 24: f7ff fffe bl 0 <__crypto_memneq> > > 24: R_ARM_THM_CALL __crypto_memneq > > 28: 3000 adds r0, #0 > > 2a: bf18 it ne > > 2c: 2001 movne r0, #1 > > 2e: bd70 pop {r4, r5, r6, pc} > > 30: 4632 mov r2, r6 > > 32: 4629 mov r1, r5 > > 34: 4620 mov r0, r4 > > 36: f7ff fffe bl 0 <curve25519_generic> > > 36: R_ARM_THM_CALL curve25519_generic > > 3a: e7ed b.n 18 <curve25519+0x18> > > > > curve25519_arch is a weak reference. It either gets satisfied at > > module load time, or it doesn't. > > > > If it does get satisfied, the relocations covering the movw/movt pair > > and the one covering the bl instruction get updated so that they point > > to the arch routine. > > > > If it does not get satisfied, the relocations are disregarded, in > > which case the cbz instruction at offset 0x10 jumps over the bl call. > > > > Note that this does not involve any memory accesses. It does involve > > some code patching, but only of the kind the module loader already > > does. > > Won’t this have the counterintuitive property that, if you load the modules in the opposite order, the reference won’t be re-resolved and performance will silently regress? > Indeed, the arch module needs to be loaded first > I think it might be better to allow two different modules to export the same symbol but only allow one of them to be loaded. That is what I am doing for chacha and poly > Or use static calls.