On Thu, Feb 22, 2018 at 12:39:15PM +0100, Pablo Neira Ayuso wrote: > Hi Alexei, > > On Wed, Feb 21, 2018 at 06:20:37PM -0800, Alexei Starovoitov wrote: > > On Wed, Feb 21, 2018 at 01:13:03PM +0100, Florian Westphal wrote: > > > > > > Obvious candidates are: meta, numgen, limit, objref, quota, reject. > > > > > > We should probably also consider removing > > > CONFIG_NFT_SET_RBTREE and CONFIG_NFT_SET_HASH and just always > > > build both too (at least rbtree since that offers interval). > > > > > > For the indirect call issue we can use direct calls from eval loop for > > > some of the more frequently used ones, similar to what we do already > > > for nft_cmp_fast_expr. > > > > nft_cmp_fast_expr and other expressions mentioned above made me thinking... > > > > do we have the same issue with nft interpreter as we had with bpf one? > > bpf interpreter was used as part of spectre2 attack to leak > > information via cache side channel and let VM read hypervisor memory. > > Due to that issue we removed bpf interpreter from the kernel code. > > That's what CONFIG_BPF_JIT_ALWAYS_ON for... > > but we still have nft interpreter in the kernel that can also > > execute arbitrary nft expressions. > > > > Jann's exploit used the following bpf instructions: > > struct bpf_insn evil_bytecode_instrs[] = { > > // rax = target_byte_addr > > { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 0, .imm = target_byte_addr }, { .imm = target_byte_addr>>32 }, > > We don't place pointers in the nft VM registers, it's basically > illegal to do so, otherwise we would need more sophisticated verifier. > I'm telling this because we don't have a way to point to any arbitrary > address as in 'target_byte_addr' above. these evil_bytecode_instrs never saw bpf verifier either. That's the scary part of that poc. The only requirement for poc to work is to have interpreter in executable part of hypervisor code and speculatively jump into it with arguments pointing to memory controlled by vm. All static checks (done by bpf verifier and by nft validation) are bypassed. The only way to defend from such exploit is either remove the interpreter from the kernel or add _run-time_ checks and masks for every memory access (similar to what is done for spectre1 mitigations). In case of bpf it's impractical. In case of nft I suspect so too. I don't yet see how nft can check that skb pointer passed as part of nft_pktinfo is not an actual skb. > > // rdi = timing_leak_array > > { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 1, .imm = host_timing_leak_addr }, { .imm = host_timing_leak_addr>>32 }, > > // rax = *(u8*)rax > > { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0 }, > > // rax = rax << ... > > { .code = BPF_ALU64 | BPF_LSH | BPF_K, .dst_reg = 0, .imm = 10 - bit_idx }, > > // rax = rax & 0x400 > > { .code = BPF_ALU64 | BPF_AND | BPF_K, .dst_reg = 0, .imm = 0x400 }, > > // rax = rdi + rax > > { .code = BPF_ALU64 | BPF_ADD | BPF_X, .dst_reg = 0, .src_reg = 1 }, > > // *(u8*) (rax + 0x800) > > { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0x800 }, > > > > and a gadget to jump into __bpf_prog_run with insn pointing > > to memory controlled by the guest while accessible > > (at different virt address) by the hypervisor. > > > > It seems possible to construct similar sequence of instructions > > out of nft expressions and use gadget that jumps into nft_do_chain(). > > The attacker would need to discover more kernel addresses: > > nft_do_chain, nft_cmp_fast_ops, nft_payload_fast_ops, nft_bitwise_eval, > > nft_lookup_eval, and nft_bitmap_lookup > > to populate nft chains, rules and expressions in guest memory > > comparing to bpf interpreter attack. > > > > Then in nft_do_chain(struct nft_pktinfo *pkt, void *priv) > > pkt needs to point to fake struct sk_buff in guest memory with > > skb->head == target_byte_addr > > We don't have a way to make this point to fake struct sk_buff. yet. it's possible, since cpu is speculating and all such pointers controlled by vm can be arbitrary. > > The first nft expression can be nft_payload_fast_eval(). > > If it's properly constructed with > > (nft_payload->based == NFT_PAYLOAD_NETWORK_HEADER, offset == 0, len == 0, dreg == 1) > > We can reject len == 0. To be honest, this is not done right now, but > we can place a patch to validate this. Given this is a specialized > networking virtual machine, it retain semantics, so fetching zero > length data from a skbuff makes no sense, hence, we can return EINVAL > via netlink when adding a rule that tries to do this. Adding static check won't help. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html