nft/bpf interpreters and spectre2. Was: [PATCH RFC 0/4] net: add bpfilter

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 21 Feb 2018 18:20:37 -0800

On Wed, Feb 21, 2018 at 01:13:03PM +0100, Florian Westphal wrote:
> 
> Obvious candidates are: meta, numgen, limit, objref, quota, reject.
> 
> We should probably also consider removing
> CONFIG_NFT_SET_RBTREE and CONFIG_NFT_SET_HASH and just always
> build both too (at least rbtree since that offers interval).
> 
> For the indirect call issue we can use direct calls from eval loop for
> some of the more frequently used ones, similar to what we do already
> for nft_cmp_fast_expr. 

nft_cmp_fast_expr and other expressions mentioned above made me thinking...

do we have the same issue with nft interpreter as we had with bpf one?
bpf interpreter was used as part of spectre2 attack to leak
information via cache side channel and let VM read hypervisor memory.
Due to that issue we removed bpf interpreter from the kernel code.
That's what CONFIG_BPF_JIT_ALWAYS_ON for...
but we still have nft interpreter in the kernel that can also
execute arbitrary nft expressions.

Jann's exploit used the following bpf instructions:
struct bpf_insn evil_bytecode_instrs[] = {
// rax = target_byte_addr
{ .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 0, .imm = target_byte_addr }, { .imm = target_byte_addr>>32 },
// rdi = timing_leak_array
{ .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 1, .imm = host_timing_leak_addr }, { .imm = host_timing_leak_addr>>32 },
// rax = *(u8*)rax
{ .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0 },
// rax = rax << ...
{ .code = BPF_ALU64 | BPF_LSH | BPF_K, .dst_reg = 0, .imm = 10 - bit_idx },
// rax = rax & 0x400
{ .code = BPF_ALU64 | BPF_AND | BPF_K, .dst_reg = 0, .imm = 0x400 },
// rax = rdi + rax
{ .code = BPF_ALU64 | BPF_ADD | BPF_X, .dst_reg = 0, .src_reg = 1 },
// *(u8*) (rax + 0x800)
{ .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0x800 },

and a gadget to jump into __bpf_prog_run with insn pointing
to memory controlled by the guest while accessible
(at different virt address) by the hypervisor.

It seems possible to construct similar sequence of instructions
out of nft expressions and use gadget that jumps into nft_do_chain().
The attacker would need to discover more kernel addresses:
nft_do_chain, nft_cmp_fast_ops, nft_payload_fast_ops, nft_bitwise_eval,
nft_lookup_eval, and nft_bitmap_lookup
to populate nft chains, rules and expressions in guest memory
comparing to bpf interpreter attack.

Then in nft_do_chain(struct nft_pktinfo *pkt, void *priv)
pkt needs to point to fake struct sk_buff in guest memory with
skb->head == target_byte_addr
The first nft expression can be nft_payload_fast_eval().
If it's properly constructed with
(nft_payload->based == NFT_PAYLOAD_NETWORK_HEADER, offset == 0, len == 0, dreg == 1)
it will do arbitrary load of
*(u8 *)dest = *(u8 *)ptr;
from target_byte_addr into register 1 of nft state machine
(dest is u32 array of registers in the stack of nft_do_chain)
Second nft expression can be nft_bitwise_eval() to mask particular
bit in register 1.
Then nft_cmp_eval() to check whether bit is one or zero and
conditional NFT_BREAK out of first nft expression into second nft rule.
The last conditional nft_immediate_eval() in the first rule will set
register 1 to 0x400 * 8 while the first nft_bitwise_eval() in
the second rule with do r1 &= 0x400 * 8.
So at this point r1 will have either 0x400 * 8 or 0 depending
on value of speculatively loaded bit.
The last expression can be nft_lookup_eval() with 
nft_lookup->set->ops->lookup == nft_bitmap_lookup
which will do nft_bitmap->bitmap[idx] where idx = r1 / 8
The memory used for this last nft_lookup/bitmap expression is
both an instruction and timing_leak_array itself.
If I'm not mistaken, this sequence of nft expression will
speculatively execute very similar logic as in evil_bytecode_instrs[]

The amount of actual speculative native cpu load/stores/branches is
probably more than executed by bpf interpreter for these evil bytecodes,
but likely well within cpu speculation window of 100+ insns.

Obviously such exploit is harder to do than bpf based one.
Do we need to do anything about it ?
May be it's easier to find gadgets in .text of vmlinux
instead of messing with interpreters?

Jann,
can you comment on removing interpreters in general?
Do we need to worry about having bpf and/or nft interpreter
in the kernel?

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html