On Wed, Feb 21, 2018 at 01:13:03PM +0100, Florian Westphal wrote: > > Obvious candidates are: meta, numgen, limit, objref, quota, reject. > > We should probably also consider removing > CONFIG_NFT_SET_RBTREE and CONFIG_NFT_SET_HASH and just always > build both too (at least rbtree since that offers interval). > > For the indirect call issue we can use direct calls from eval loop for > some of the more frequently used ones, similar to what we do already > for nft_cmp_fast_expr. nft_cmp_fast_expr and other expressions mentioned above made me thinking... do we have the same issue with nft interpreter as we had with bpf one? bpf interpreter was used as part of spectre2 attack to leak information via cache side channel and let VM read hypervisor memory. Due to that issue we removed bpf interpreter from the kernel code. That's what CONFIG_BPF_JIT_ALWAYS_ON for... but we still have nft interpreter in the kernel that can also execute arbitrary nft expressions. Jann's exploit used the following bpf instructions: struct bpf_insn evil_bytecode_instrs[] = { // rax = target_byte_addr { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 0, .imm = target_byte_addr }, { .imm = target_byte_addr>>32 }, // rdi = timing_leak_array { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 1, .imm = host_timing_leak_addr }, { .imm = host_timing_leak_addr>>32 }, // rax = *(u8*)rax { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0 }, // rax = rax << ... { .code = BPF_ALU64 | BPF_LSH | BPF_K, .dst_reg = 0, .imm = 10 - bit_idx }, // rax = rax & 0x400 { .code = BPF_ALU64 | BPF_AND | BPF_K, .dst_reg = 0, .imm = 0x400 }, // rax = rdi + rax { .code = BPF_ALU64 | BPF_ADD | BPF_X, .dst_reg = 0, .src_reg = 1 }, // *(u8*) (rax + 0x800) { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0x800 }, and a gadget to jump into __bpf_prog_run with insn pointing to memory controlled by the guest while accessible (at different virt address) by the hypervisor. It seems possible to construct similar sequence of instructions out of nft expressions and use gadget that jumps into nft_do_chain(). The attacker would need to discover more kernel addresses: nft_do_chain, nft_cmp_fast_ops, nft_payload_fast_ops, nft_bitwise_eval, nft_lookup_eval, and nft_bitmap_lookup to populate nft chains, rules and expressions in guest memory comparing to bpf interpreter attack. Then in nft_do_chain(struct nft_pktinfo *pkt, void *priv) pkt needs to point to fake struct sk_buff in guest memory with skb->head == target_byte_addr The first nft expression can be nft_payload_fast_eval(). If it's properly constructed with (nft_payload->based == NFT_PAYLOAD_NETWORK_HEADER, offset == 0, len == 0, dreg == 1) it will do arbitrary load of *(u8 *)dest = *(u8 *)ptr; from target_byte_addr into register 1 of nft state machine (dest is u32 array of registers in the stack of nft_do_chain) Second nft expression can be nft_bitwise_eval() to mask particular bit in register 1. Then nft_cmp_eval() to check whether bit is one or zero and conditional NFT_BREAK out of first nft expression into second nft rule. The last conditional nft_immediate_eval() in the first rule will set register 1 to 0x400 * 8 while the first nft_bitwise_eval() in the second rule with do r1 &= 0x400 * 8. So at this point r1 will have either 0x400 * 8 or 0 depending on value of speculatively loaded bit. The last expression can be nft_lookup_eval() with nft_lookup->set->ops->lookup == nft_bitmap_lookup which will do nft_bitmap->bitmap[idx] where idx = r1 / 8 The memory used for this last nft_lookup/bitmap expression is both an instruction and timing_leak_array itself. If I'm not mistaken, this sequence of nft expression will speculatively execute very similar logic as in evil_bytecode_instrs[] The amount of actual speculative native cpu load/stores/branches is probably more than executed by bpf interpreter for these evil bytecodes, but likely well within cpu speculation window of 100+ insns. Obviously such exploit is harder to do than bpf based one. Do we need to do anything about it ? May be it's easier to find gadgets in .text of vmlinux instead of messing with interpreters? Jann, can you comment on removing interpreters in general? Do we need to worry about having bpf and/or nft interpreter in the kernel? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html