Re: [PATCH v2 00/20] crypto: crypto API library interfaces for WireGuard

Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> · Fri, 4 Oct 2019 16:56:10 +0200



On Fri, 4 Oct 2019 at 16:53, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
>
>
> > On Oct 4, 2019, at 6:52 AM, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
> >
> > On Fri, 4 Oct 2019 at 15:42, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
> >>
> >>> On Thu, Oct 03, 2019 at 10:43:29AM +0200, Ard Biesheuvel wrote:
> >>> On Wed, 2 Oct 2019 at 16:17, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
> >>>>
> >>> ...
> >>>>
> >>>> In the future, I would like to extend these interfaces to use static calls,
> >>>> so that the accelerated implementations can be [un]plugged at runtime. For
> >>>> the time being, we rely on weak aliases and conditional exports so that the
> >>>> users of the library interfaces link directly to the accelerated versions,
> >>>> but without the ability to unplug them.
> >>>>
> >>>
> >>> As it turns out, we don't actually need static calls for this.
> >>> Instead, we can simply permit weak symbol references to go unresolved
> >>> between modules (as we already do in the kernel itself, due to the
> >>> fact that ELF permits it), and have the accelerated code live in
> >>> separate modules that may not be loadable on certain systems, or be
> >>> blacklisted by the user.
> >>
> >> You're saying that at module insertion time, the kernel will override
> >> weak symbols with those provided by the module itself? At runtime?
> >>
> >
> > Yes.
> >
> >> Do you know offhand how this patching works? Is there a PLT that gets
> >> patched, and so the calls all go through a layer of function pointer
> >> indirection? Or are all call sites fixed up at insertion time and the
> >> call instructions rewritten with some runtime patching magic?
> >>
> >
> > No magic. Take curve25519 for example, when built for ARM:
> >
> > 00000000 <curve25519>:
> >   0:   f240 0300       movw    r3, #0
> >                        0: R_ARM_THM_MOVW_ABS_NC        curve25519_arch
> >   4:   f2c0 0300       movt    r3, #0
> >                        4: R_ARM_THM_MOVT_ABS   curve25519_arch
> >   8:   b570            push    {r4, r5, r6, lr}
> >   a:   4604            mov     r4, r0
> >   c:   460d            mov     r5, r1
> >   e:   4616            mov     r6, r2
> >  10:   b173            cbz     r3, 30 <curve25519+0x30>
> >  12:   f7ff fffe       bl      0 <curve25519_arch>
> >                        12: R_ARM_THM_CALL      curve25519_arch
> >  16:   b158            cbz     r0, 30 <curve25519+0x30>
> >  18:   4620            mov     r0, r4
> >  1a:   2220            movs    r2, #32
> >  1c:   f240 0100       movw    r1, #0
> >                        1c: R_ARM_THM_MOVW_ABS_NC       .LANCHOR0
> >  20:   f2c0 0100       movt    r1, #0
> >                        20: R_ARM_THM_MOVT_ABS  .LANCHOR0
> >  24:   f7ff fffe       bl      0 <__crypto_memneq>
> >                        24: R_ARM_THM_CALL      __crypto_memneq
> >  28:   3000            adds    r0, #0
> >  2a:   bf18            it      ne
> >  2c:   2001            movne   r0, #1
> >  2e:   bd70            pop     {r4, r5, r6, pc}
> >  30:   4632            mov     r2, r6
> >  32:   4629            mov     r1, r5
> >  34:   4620            mov     r0, r4
> >  36:   f7ff fffe       bl      0 <curve25519_generic>
> >                        36: R_ARM_THM_CALL      curve25519_generic
> >  3a:   e7ed            b.n     18 <curve25519+0x18>
> >
> > curve25519_arch is a weak reference. It either gets satisfied at
> > module load time, or it doesn't.
> >
> > If it does get satisfied, the relocations covering the movw/movt pair
> > and the one covering the bl instruction get updated so that they point
> > to the arch routine.
> >
> > If it does not get satisfied, the relocations are disregarded, in
> > which case the cbz instruction at offset 0x10 jumps over the bl call.
> >
> > Note that this does not involve any memory accesses. It does involve
> > some code patching, but only of the kind the module loader already
> > does.
>
> Won’t this have the counterintuitive property that, if you load the modules in the opposite order, the reference won’t be re-resolved and performance will silently regress?
>

Indeed, the arch module needs to be loaded first

> I think it might be better to allow two different modules to export the same symbol but only allow one of them to be loaded.

That is what I am doing for chacha and poly

> Or use static calls.