Re: nft/bpf interpreters and spectre2. Was: [PATCH RFC 0/4] net: add bpfilter

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Thu, 22 Feb 2018 12:39:15 +0100

Hi Alexei,

On Wed, Feb 21, 2018 at 06:20:37PM -0800, Alexei Starovoitov wrote:
> On Wed, Feb 21, 2018 at 01:13:03PM +0100, Florian Westphal wrote:
> > 
> > Obvious candidates are: meta, numgen, limit, objref, quota, reject.
> > 
> > We should probably also consider removing
> > CONFIG_NFT_SET_RBTREE and CONFIG_NFT_SET_HASH and just always
> > build both too (at least rbtree since that offers interval).
> > 
> > For the indirect call issue we can use direct calls from eval loop for
> > some of the more frequently used ones, similar to what we do already
> > for nft_cmp_fast_expr. 
> 
> nft_cmp_fast_expr and other expressions mentioned above made me thinking...
> 
> do we have the same issue with nft interpreter as we had with bpf one?
> bpf interpreter was used as part of spectre2 attack to leak
> information via cache side channel and let VM read hypervisor memory.
> Due to that issue we removed bpf interpreter from the kernel code.
> That's what CONFIG_BPF_JIT_ALWAYS_ON for...
> but we still have nft interpreter in the kernel that can also
> execute arbitrary nft expressions.
> 
> Jann's exploit used the following bpf instructions:
> struct bpf_insn evil_bytecode_instrs[] = {
> // rax = target_byte_addr
> { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 0, .imm = target_byte_addr }, { .imm = target_byte_addr>>32 },

We don't place pointers in the nft VM registers, it's basically
illegal to do so, otherwise we would need more sophisticated verifier.
I'm telling this because we don't have a way to point to any arbitrary
address as in 'target_byte_addr' above.

> // rdi = timing_leak_array
> { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 1, .imm = host_timing_leak_addr }, { .imm = host_timing_leak_addr>>32 },
> // rax = *(u8*)rax
> { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0 },
> // rax = rax << ...
> { .code = BPF_ALU64 | BPF_LSH | BPF_K, .dst_reg = 0, .imm = 10 - bit_idx },
> // rax = rax & 0x400
> { .code = BPF_ALU64 | BPF_AND | BPF_K, .dst_reg = 0, .imm = 0x400 },
> // rax = rdi + rax
> { .code = BPF_ALU64 | BPF_ADD | BPF_X, .dst_reg = 0, .src_reg = 1 },
> // *(u8*) (rax + 0x800)
> { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0x800 },
> 
> and a gadget to jump into __bpf_prog_run with insn pointing
> to memory controlled by the guest while accessible
> (at different virt address) by the hypervisor.
> 
> It seems possible to construct similar sequence of instructions
> out of nft expressions and use gadget that jumps into nft_do_chain().
> The attacker would need to discover more kernel addresses:
> nft_do_chain, nft_cmp_fast_ops, nft_payload_fast_ops, nft_bitwise_eval,
> nft_lookup_eval, and nft_bitmap_lookup
> to populate nft chains, rules and expressions in guest memory
> comparing to bpf interpreter attack.
> 
> Then in nft_do_chain(struct nft_pktinfo *pkt, void *priv)
> pkt needs to point to fake struct sk_buff in guest memory with
> skb->head == target_byte_addr

We don't have a way to make this point to fake struct sk_buff.

> The first nft expression can be nft_payload_fast_eval().
> If it's properly constructed with
> (nft_payload->based == NFT_PAYLOAD_NETWORK_HEADER, offset == 0, len == 0, dreg == 1)

We can reject len == 0. To be honest, this is not done right now, but
we can place a patch to validate this. Given this is a specialized
networking virtual machine, it retain semantics, so fetching zero
length data from a skbuff makes no sense, hence, we can return EINVAL
via netlink when adding a rule that tries to do this.

> it will do arbitrary load of
> *(u8 *)dest = *(u8 *)ptr;
> from target_byte_addr into register 1 of nft state machine
> (dest is u32 array of registers in the stack of nft_do_chain)
> Second nft expression can be nft_bitwise_eval() to mask particular
> bit in register 1.
> Then nft_cmp_eval() to check whether bit is one or zero and
> conditional NFT_BREAK out of first nft expression into second nft rule.
> The last conditional nft_immediate_eval() in the first rule will set
> register 1 to 0x400 * 8 while the first nft_bitwise_eval() in
> the second rule with do r1 &= 0x400 * 8.
> So at this point r1 will have either 0x400 * 8 or 0 depending
> on value of speculatively loaded bit.
> The last expression can be nft_lookup_eval() with 
> nft_lookup->set->ops->lookup == nft_bitmap_lookup
> which will do nft_bitmap->bitmap[idx] where idx = r1 / 8
> The memory used for this last nft_lookup/bitmap expression is
> both an instruction and timing_leak_array itself.
> If I'm not mistaken, this sequence of nft expression will
> speculatively execute very similar logic as in evil_bytecode_instrs[]

My impression is that several assumptions above are not correct.

> The amount of actual speculative native cpu load/stores/branches is
> probably more than executed by bpf interpreter for these evil bytecodes,
> but likely well within cpu speculation window of 100+ insns.
> 
> Obviously such exploit is harder to do than bpf based one.
> Do we need to do anything about it ?
> May be it's easier to find gadgets in .text of vmlinux
> instead of messing with interpreters?
> 
> Jann,
> can you comment on removing interpreters in general?
> Do we need to worry about having bpf and/or nft interpreter
> in the kernel?

Jann, any review on that front is very much welcome, so please let me
know if you find any real problem.

Thanks for reviewing!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html