Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 7/15/20 12:26 PM, Jakub Sitnicki wrote:
On Wed, Jul 15, 2020 at 08:44 AM CEST, Yonghong Song wrote:
On 7/10/20 10:31 AM, Jakub Sitnicki wrote:
When size of load from context is the same as target field size, but less
than context field size, the verifier does not emit the shift and mask
instructions for loads at non-zero offset.

This has the unexpected effect of loading the same data no matter what the
offset was. While the expected behavior would be to load zeros for offsets
that are greater than target field size.

For instance, u16 load from u32 context field backed by u16 target field at
an offset of 2 bytes results in:

    SEC("sk_reuseport/narrow_half")
    int reuseport_narrow_half(struct sk_reuseport_md *ctx)
    {
    	__u16 *half;

    	half = (__u16 *)&ctx->ip_protocol;
    	if (half[0] == 0xaaaa)
    		return SK_DROP;
    	if (half[1] == 0xbbbb)
    		return SK_DROP;
    	return SK_PASS;
    }

It would be good if you can include llvm asm output like below so people
can correlate source => asm => xlated codes:

        0:       w0 = 0
        1:       r2 = *(u16 *)(r1 + 24)
        2:       if w2 == 43690 goto +4 <LBB0_3>
        3:       r1 = *(u16 *)(r1 + 26)
        4:       w0 = 1
        5:       if w1 != 48059 goto +1 <LBB0_3>
        6:       w0 = 0

0000000000000038 <LBB0_3>:
        7:       exit

Sure, not a problem, if it makes reasoning about the problem easier.
I'm assuming that it is the narrow load at an offset that you wanted to
confirm with asm output.



    int reuseport_narrow_half(struct sk_reuseport_md * ctx):
    ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
       0: (b4) w0 = 0
    ; if (half[0] == 0xaaaa)
       1: (79) r2 = *(u64 *)(r1 +8)
       2: (69) r2 = *(u16 *)(r2 +924)
    ; if (half[0] == 0xaaaa)
       3: (16) if w2 == 0xaaaa goto pc+5
    ; if (half[1] == 0xbbbb)
       4: (79) r1 = *(u64 *)(r1 +8)
       5: (69) r1 = *(u16 *)(r1 +924)
       6: (b4) w0 = 1
    ; if (half[1] == 0xbbbb)
       7: (56) if w1 != 0xbbbb goto pc+1
       8: (b4) w0 = 0
    ; }
       9: (95) exit

Indeed we have an issue here. The insn 5 is not correct.
The original assembly is correct.

Internally ip_protocol is backed by 2 bytes in sk_reuseport_kern.
The current verifier implementation makes an important assumption:
    all user load requests are within the size of kernel internal range
In this case, the verifier actually only correctly supports
    . one byte from offset 0
    . one byte from offset 1
    . two bytes from offset 0

I don't think that's true. For a field that has target size of 2 bytes,
like ip_protocol, 1-byte load at any offset is correctly supported
because right shifting and masking takes place. That is because we hit
the "size < target_size" condition in this case. Only loads of size >=
target size at an offset != give surprising results.

okay, yes, you are right. We relied on the fact even the inst is to load 2 bytes but effectively we load 8 bytes always for 64bit load and 4 bytes always for 32bit load. For any other one byte access, shifting and masking will kick in to make it work although the value is 0.



The original assembly code tries to access 2 bytes from offset 2
and the verifier did incorrect transformation.

This actually makes sense since any other read is
misleading. For example, for ip_protocol, if people wants to
load 2 bytes from offset 2, what should we return? 0? In this case,
actually verifier can convert it to 0 with doing a load.

Yes, IMHO, if you are loading 2 bytes at offset of 2 from a 4 byte
context field that holds an unsigned value, then it should return 0 for
for a field that is backed by a 2 byte kernel field.

I agree that it could be optimized to load an immediate value instead of
performing a load from memory. It didn't occur to me, so thanks for the
suggestion.

In this case half[0] == half[1] == sk->sk_protocol that backs the
ctx->ip_protocol field.

Fix it by shifting and masking any load from context that is narrower than
context field size (is_narrower_load = size < ctx_field_size), in addition
to loads that are narrower than target field size.

The fix can workaround the issue, but I think we should generate better codes
for such cases.

Not sure I'd go as far as calling it a workaround. After all I
understand that in BPF loading a half word into a register is well
defined, and you can rely on upper word being zero. But please correct
me if not.

This is true. I should have said that "the fix can fix the issue".


You're right, though, that approach can be smarter here, that is we can
emit just a single instruction that doesn't access memory.

The "size < target_size" check is left in place to cover the case when a
context field is narrower than its target field, even if we might not have
such case now. (It would have to be a u32 context field backed by a u64
target field, with context fields all being 4-bytes or wider.)

Going back to the example, with the fix in place, the upper half load from
ctx->ip_protocol yields zero:

    int reuseport_narrow_half(struct sk_reuseport_md * ctx):
    ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
       0: (b4) w0 = 0
    ; if (half[0] == 0xaaaa)
       1: (79) r2 = *(u64 *)(r1 +8)
       2: (69) r2 = *(u16 *)(r2 +924)
       3: (54) w2 &= 65535
    ; if (half[0] == 0xaaaa)
       4: (16) if w2 == 0xaaaa goto pc+7
    ; if (half[1] == 0xbbbb)
       5: (79) r1 = *(u64 *)(r1 +8)
       6: (69) r1 = *(u16 *)(r1 +924)

The load is still from offset 0, 2 bytes with upper 48 bits as 0.

Yes, this is how narrow loads currently work, right? It is not specific
to the case I'm fixing.

To give an example - if you do a 1-byte load at offset 1, it will load
the value from offset 0, and shift it right by 1 byte. So it is expected
that the load is always from offset 0 with current implementation.

Yes, the load is always from offset 0. The confusion part is
it load offset 0 with 2 bytes and then right shifting 2 bytes
to get 0...


SEC("sk_reuseport/narrow_byte")
int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
{
	__u8 *byte;

	byte = (__u8 *)&ctx->ip_protocol;
	if (byte[0] == 0xaa)
		return SK_DROP;
	if (byte[1] == 0xbb)
		return SK_DROP;
	if (byte[2] == 0xcc)
		return SK_DROP;
	if (byte[3] == 0xdd)
		return SK_DROP;
	return SK_PASS;
}

int reuseport_narrow_byte(struct sk_reuseport_md * ctx):
; int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
    0: (b4) w0 = 0
; if (byte[0] == 0xaa)
    1: (79) r2 = *(u64 *)(r1 +8)
    2: (69) r2 = *(u16 *)(r2 +924)
    3: (54) w2 &= 255
; if (byte[0] == 0xaa)
    4: (16) if w2 == 0xaa goto pc+17
; if (byte[1] == 0xbb)
    5: (79) r2 = *(u64 *)(r1 +8)
    6: (69) r2 = *(u16 *)(r2 +924)
    7: (74) w2 >>= 8
    8: (54) w2 &= 255
; if (byte[1] == 0xbb)
    9: (16) if w2 == 0xbb goto pc+12
; if (byte[2] == 0xcc)
   10: (79) r2 = *(u64 *)(r1 +8)
   11: (69) r2 = *(u16 *)(r2 +924)
   12: (74) w2 >>= 16
   13: (54) w2 &= 255
; if (byte[2] == 0xcc)
   14: (16) if w2 == 0xcc goto pc+7
; if (byte[3] == 0xdd)
   15: (79) r1 = *(u64 *)(r1 +8)
   16: (69) r1 = *(u16 *)(r1 +924)
   17: (74) w1 >>= 24
   18: (54) w1 &= 255
   19: (b4) w0 = 1
; if (byte[3] == 0xdd)
   20: (56) if w1 != 0xdd goto pc+1
   21: (b4) w0 = 0
; }
   22: (95) exit


       7: (74) w1 >>= 16

w1 will be 0 now. so this will work.

       8: (54) w1 &= 65535

For the above insns 5-8, verifier, based on target information can
directly generate w1 = 0 since:
   . target kernel field size is 2, ctx field size is 4.
   . user tries to access offset 2 size 2.

Here, we need to decide whether we permits user to do partial read beyond of
kernel narrow field or not (e.g., this example)? I would
say yes, but Daniel or Alexei can provide additional comments.

If we allow such accesses, I would like verifier to generate better
code as I illustrated in the above. This can be implemented in
verifier itself with target passing additional kernel field size
to the verifier. The target already passed the ctx field size back
to the verifier.

Keep in mind that the BPF user is writing their code under the
assumption that the context field has 4 bytes. IMHO it's reasonable to
expect that I can load 2 bytes at offset of 2 from a 4 byte field.

Restricting it now to loads below the target field size, which is
unknown to the user, would mean rejecting programs that are working
today. Even if they are getting funny values.

I think implementing what you suggest is doable without major
changes. We have load size, target field size, and context field size at
hand in convert_ctx_accesses(), so it seems like a matter of adding an
'if' branch to handle better the case when we know the end result must
be 0. I'll give it a try.

Sounds good. The target_size is returned in convert_ctx_access(), which
is too late as the verifier already generated load instructions. You need to get it earlier in is_valid_access().


But I do want to empahsize that I still think the fix in current form is
correct, or at least not worse than what we have already in place narrow
loads.

I did agree that the fix in this patch is correct. It is just that we
could do better to fix this problem.



       9: (b4) w0 = 1
    ; if (half[1] == 0xbbbb)
      10: (56) if w1 != 0xbbbb goto pc+1
      11: (b4) w0 = 0
    ; }
      12: (95) exit

Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
Signed-off-by: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx>
---
   kernel/bpf/verifier.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 94cead5a43e5..1c4d0e24a5a2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
   			return -EINVAL;
   		}
   -		if (is_narrower_load && size < target_size) {
+		if (is_narrower_load || size < target_size) {
   			u8 shift = bpf_ctx_narrow_access_offset(
   				off, size, size_default) * 8;
   			if (ctx_field_size <= 4) {




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux