> On Oct 4, 2019, at 6:52 AM, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > > On Fri, 4 Oct 2019 at 15:42, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: >> >>> On Thu, Oct 03, 2019 at 10:43:29AM +0200, Ard Biesheuvel wrote: >>> On Wed, 2 Oct 2019 at 16:17, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: >>>> >>> ... >>>> >>>> In the future, I would like to extend these interfaces to use static calls, >>>> so that the accelerated implementations can be [un]plugged at runtime. For >>>> the time being, we rely on weak aliases and conditional exports so that the >>>> users of the library interfaces link directly to the accelerated versions, >>>> but without the ability to unplug them. >>>> >>> >>> As it turns out, we don't actually need static calls for this. >>> Instead, we can simply permit weak symbol references to go unresolved >>> between modules (as we already do in the kernel itself, due to the >>> fact that ELF permits it), and have the accelerated code live in >>> separate modules that may not be loadable on certain systems, or be >>> blacklisted by the user. >> >> You're saying that at module insertion time, the kernel will override >> weak symbols with those provided by the module itself? At runtime? >> > > Yes. > >> Do you know offhand how this patching works? Is there a PLT that gets >> patched, and so the calls all go through a layer of function pointer >> indirection? Or are all call sites fixed up at insertion time and the >> call instructions rewritten with some runtime patching magic? >> > > No magic. Take curve25519 for example, when built for ARM: > > 00000000 <curve25519>: > 0: f240 0300 movw r3, #0 > 0: R_ARM_THM_MOVW_ABS_NC curve25519_arch > 4: f2c0 0300 movt r3, #0 > 4: R_ARM_THM_MOVT_ABS curve25519_arch > 8: b570 push {r4, r5, r6, lr} > a: 4604 mov r4, r0 > c: 460d mov r5, r1 > e: 4616 mov r6, r2 > 10: b173 cbz r3, 30 <curve25519+0x30> > 12: f7ff fffe bl 0 <curve25519_arch> > 12: R_ARM_THM_CALL curve25519_arch > 16: b158 cbz r0, 30 <curve25519+0x30> > 18: 4620 mov r0, r4 > 1a: 2220 movs r2, #32 > 1c: f240 0100 movw r1, #0 > 1c: R_ARM_THM_MOVW_ABS_NC .LANCHOR0 > 20: f2c0 0100 movt r1, #0 > 20: R_ARM_THM_MOVT_ABS .LANCHOR0 > 24: f7ff fffe bl 0 <__crypto_memneq> > 24: R_ARM_THM_CALL __crypto_memneq > 28: 3000 adds r0, #0 > 2a: bf18 it ne > 2c: 2001 movne r0, #1 > 2e: bd70 pop {r4, r5, r6, pc} > 30: 4632 mov r2, r6 > 32: 4629 mov r1, r5 > 34: 4620 mov r0, r4 > 36: f7ff fffe bl 0 <curve25519_generic> > 36: R_ARM_THM_CALL curve25519_generic > 3a: e7ed b.n 18 <curve25519+0x18> > > curve25519_arch is a weak reference. It either gets satisfied at > module load time, or it doesn't. > > If it does get satisfied, the relocations covering the movw/movt pair > and the one covering the bl instruction get updated so that they point > to the arch routine. > > If it does not get satisfied, the relocations are disregarded, in > which case the cbz instruction at offset 0x10 jumps over the bl call. > > Note that this does not involve any memory accesses. It does involve > some code patching, but only of the kind the module loader already > does. Won’t this have the counterintuitive property that, if you load the modules in the opposite order, the reference won’t be re-resolved and performance will silently regress? I think it might be better to allow two different modules to export the same symbol but only allow one of them to be loaded. Or use static calls.