Re: [PATCH RFC v4 net-next 01/26] net: filter: add "load 64-bit immediate" eBPF instruction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 13, 2014 at 2:17 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Wed, Aug 13, 2014 at 2:02 PM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
>> On Wed, Aug 13, 2014 at 11:35 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>
>>> The compiler can still think of it as a single insn, though, but some
>>> future compiler might not.
>>
>> I think that would be very dangerous.
>> compiler (user space) and kernel interpreter must have the same
>> understanding of ISA.
>>
>>> In any case, I think that, if you use the
>>> same code for high and for low, you need logic in the JIT that's at
>>> least as complicated.
>>
>> why do you think so? Handling of pseudo BPF_LD_IMM64 is done
>> in single patch #11 which is one of the smallest...
>>
>>> For example, what happens if you have two
>>> consecutive 64-bit immediate loads to the same register?  Now you have
>>> four consecutive 8-byte insn words that differ only in their immediate
>>> values, and you need to split them correctly.
>>
>> I don't need to do anything special in this case.
>> Two 16-byte instructions back to back is not a problem.
>> Interpreter or JIT don't care whether they move the same or different
>> immediates into the same or different register. Interpreter and JITs
>> are dumb on purpose.
>> when verifier sees two back to back ld_imm64, the 2nd will simply
>> override the value loaded by first one. It's not any different than
>> two back to back 'mov dst_reg, imm32' instructions.
>
> But this patch makes the JIT code (and any interpreter) weirdly
> stateful.  You have:
>
> +               case BPF_LD | BPF_IMM | BPF_DW:
> +                       /* movabsq %rax, imm64 */
> +                       EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
> +                       EMIT(insn->imm, 4);
> +                       insn++;
> +                       i++;
> +                       EMIT(insn->imm, 4);
> +                       break;
>
> If you have more than two BPF_LD | BPF_IMM | BPF_DW instructions in a
> row, then the way in which they pair up depends on where you start.

For JIT it's not a problem, since it's doing a linear scan. So it always
starts at instruction boundary.
But thinking about it further you're right that it's a bug in verifier.
I've tried it and indeed depending on type of branch verifier doesn't
catch a case of two 16-byte instructions back to back and jump
goes into 2nd half of 1st insn. I need to fix that.

> I think it would be a lot clearer if you made these be "load low" and
> "load high", with JIT code like:
>
> +               case BPF_LOAD_LOW:
> +                       /* movabsq %rax, imm64 */
> +                       if (next insn is BPF_LOAD_HIGH) {

such 'if' will be costly in interpreter. I want to avoid it.

> (and you'd have to deal with whether load low by itself is illegal,
> zero extends, sign extends, or preserves high bits).

I don't need an instruction that loads low 32-bit. It already exists.
It's called 'mov'.
I'm going to try encoding:
insn[0].code = LD | IMM | DW
insn[1].code = 0
zero is invalid opcode, so it's your 'continuation'.
and it is still single 16-byte instructions without any interpreter overhead.

> This has a nice benefit for future-proofing: it gives you 119 bits of
> payload for 16-byte instructions.

It's already future proofed. We can add 24 byte instructions and so on
just as well. There is no point to reserve 119 bits when no one is using
them.

> On the other hand, a u8 for the opcode is kind of small, and killing
> half of that space like this is probably bad.  Maybe reserve two high
> bits, with:

That's an overkill. We use ~80 opcodes out of 256.
There is plenty of room. I see no reason to switch to 16-bit opcodes
until we get even close to half of u8 space.
It feels that we're starting to bikeshed.
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux