Re: [PATCH 07/32] mm: Bring back vmalloc_exec

"Andy Lutomirski" <luto@xxxxxxxxxx> · Tue, 20 Jun 2023 13:42:44 -0700

Hi all-

On Tue, Jun 20, 2023, at 11:48 AM, Dave Hansen wrote:
>>> No, I'm saying your concerns are baseless and too vague to
>>> address.
>> If you don't address them, the NAK will stand forever, or at least
>> until a different group of people take over x86 maintainership.
>> That's fine with me.
>
> I've got a specific concern: I don't see vmalloc_exec() used in this
> series anywhere.  I also don't see any of the actual assembly that's
> being generated, or the glue code that's calling into the generated
> assembly.
>
> I grepped around a bit in your git trees, but I also couldn't find it in
> there.  Any chance you could help a guy out and point us to some of the
> specifics of this new, tiny JIT?
>

So I had a nice discussion with Kent on IRC, and, for the benefit of everyone else reading along, I *think* the JITted code can be replaced by a table-driven approach like this:

typedef unsigned int u32;
typedef unsigned long u64;

struct uncompressed
{
    u32 a;
    u32 b;
    u64 c;
    u64 d;
    u64 e;
    u64 f;
};

struct bitblock
{
    u64 source;
    u64 target;
    u64 mask;
    int shift;
};

// out needs to be zeroed first
void unpack(struct uncompressed *out, const u64 *in, const struct bitblock *blocks, int nblocks)
{
    u64 *out_as_words = (u64*)out;
    for (int i = 0; i < nblocks; i++) {
        const struct bitblock *b;
        out_as_words[b->target] |= (in[b->source] & b->mask) << b->shift;
    }
}

void apply_offsets(struct uncompressed *out, const struct uncompressed *offsets)
{
    out->a += offsets->a;
    out->b += offsets->b;
    out->c += offsets->c;
    out->d += offsets->d;
    out->e += offsets->e;
    out->f += offsets->f;
}

Which generates nice code: https://godbolt.org/z/3fEq37hf5

It would need spectre protection in two places, I think, because it's almost most certainly a great gadget if the attacker can speculatively control the 'blocks' table.  This could be mitigated (I think) by hardcoding nblocks as 12 and by masking b->target.

In contrast, the JIT approach needs a retpoline on each call, which could be more expensive than my entire function :)  I haven't benchmarked them lately.