Re: [PATCH RFC v4 net-next 01/26] net: filter: add "load 64-bit immediate" eBPF instruction

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Wed, 13 Aug 2014 14:17:24 -0700

On Wed, Aug 13, 2014 at 2:02 PM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
> On Wed, Aug 13, 2014 at 11:35 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>
>> The compiler can still think of it as a single insn, though, but some
>> future compiler might not.
>
> I think that would be very dangerous.
> compiler (user space) and kernel interpreter must have the same
> understanding of ISA.
>
>> In any case, I think that, if you use the
>> same code for high and for low, you need logic in the JIT that's at
>> least as complicated.
>
> why do you think so? Handling of pseudo BPF_LD_IMM64 is done
> in single patch #11 which is one of the smallest...
>
>> For example, what happens if you have two
>> consecutive 64-bit immediate loads to the same register?  Now you have
>> four consecutive 8-byte insn words that differ only in their immediate
>> values, and you need to split them correctly.
>
> I don't need to do anything special in this case.
> Two 16-byte instructions back to back is not a problem.
> Interpreter or JIT don't care whether they move the same or different
> immediates into the same or different register. Interpreter and JITs
> are dumb on purpose.
> when verifier sees two back to back ld_imm64, the 2nd will simply
> override the value loaded by first one. It's not any different than
> two back to back 'mov dst_reg, imm32' instructions.

But this patch makes the JIT code (and any interpreter) weirdly
stateful.  You have:

+               case BPF_LD | BPF_IMM | BPF_DW:
+                       /* movabsq %rax, imm64 */
+                       EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
+                       EMIT(insn->imm, 4);
+                       insn++;
+                       i++;
+                       EMIT(insn->imm, 4);
+                       break;

If you have more than two BPF_LD | BPF_IMM | BPF_DW instructions in a
row, then the way in which they pair up depends on where you start.

I think it would be a lot clearer if you made these be "load low" and
"load high", with JIT code like:

+               case BPF_LOAD_LOW:
+                       /* movabsq %rax, imm64 */
+                       if (next insn is BPF_LOAD_HIGH) {
+                           EMIT2(add_1mod(0x48, dst_reg),
add_1reg(0xB8, dst_reg));
+                           EMIT(insn->imm, 4);
+                           insn++;
+                           i++;
+                           EMIT(insn->imm, 4);
+                       } else {
+                           emit a real load low;
+                       }
+                       break;

(and you'd have to deal with whether load low by itself is illegal,
zero extends, sign extends, or preserves high bits).

Alternatively, and possibly better, you could have a real encoding for
multiword instructions.  Reserve a bit in the opcode to mark a
continuation of the previous instruction, and do:

+               case BPF_LD | BPF_IMM | BPF_DW:
+                       assert(insn[1] in bounds && insn[1].code == BPF_CONT);
+                       /* movabsq %rax, imm64 */
+                       EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
+                       EMIT(insn->imm, 4);
+                       insn++;
+                       i++;
+                       EMIT(insn->imm, 4);
+                       break;

This has a nice benefit for future-proofing: it gives you 119 bits of
payload for 16-byte instructions.

On the other hand, a u8 for the opcode is kind of small, and killing
half of that space like this is probably bad.  Maybe reserve two high
bits, with:

0: normal opcode or start of a multiword sequence
1: continuation of a multiword sequence
2, 3: reserved for future longer opcode numbers (e.g. 2 could indicate
that "code" is actually 16 bits)

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html