Re: [PATCH bpf] bpf: Shift and mask loads narrower than context field size

Yonghong Song <yhs@xxxxxx> · Thu, 16 Jul 2020 10:38:13 -0700

On 7/16/20 4:48 AM, Jakub Sitnicki wrote:
On Wed, Jul 15, 2020 at 10:59 PM CEST, Yonghong Song wrote:
On 7/15/20 12:26 PM, Jakub Sitnicki wrote:
On Wed, Jul 15, 2020 at 08:44 AM CEST, Yonghong Song wrote:
On 7/10/20 10:31 AM, Jakub Sitnicki wrote:

[...]

The "size < target_size" check is left in place to cover the case when a
context field is narrower than its target field, even if we might not have
such case now. (It would have to be a u32 context field backed by a u64
target field, with context fields all being 4-bytes or wider.)

Going back to the example, with the fix in place, the upper half load from
ctx->ip_protocol yields zero:

     int reuseport_narrow_half(struct sk_reuseport_md * ctx):
     ; int reuseport_narrow_half(struct sk_reuseport_md *ctx)
        0: (b4) w0 = 0
     ; if (half[0] == 0xaaaa)
        1: (79) r2 = *(u64 *)(r1 +8)
        2: (69) r2 = *(u16 *)(r2 +924)
        3: (54) w2 &= 65535
     ; if (half[0] == 0xaaaa)
        4: (16) if w2 == 0xaaaa goto pc+7
     ; if (half[1] == 0xbbbb)
        5: (79) r1 = *(u64 *)(r1 +8)
        6: (69) r1 = *(u16 *)(r1 +924)

The load is still from offset 0, 2 bytes with upper 48 bits as 0.

Yes, this is how narrow loads currently work, right? It is not specific
to the case I'm fixing.

To give an example - if you do a 1-byte load at offset 1, it will load
the value from offset 0, and shift it right by 1 byte. So it is expected
that the load is always from offset 0 with current implementation.

Yes, the load is always from offset 0. The confusion part is
it load offset 0 with 2 bytes and then right shifting 2 bytes
to get 0...

Right, I see how silly is the generated instruction sequence. I guess
I've accepted how <prog_type>_convert_ctx_access functions emit loads
and didn't stop and question this part before.

SEC("sk_reuseport/narrow_byte")
int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
{
	__u8 *byte;

	byte = (__u8 *)&ctx->ip_protocol;
	if (byte[0] == 0xaa)
		return SK_DROP;
	if (byte[1] == 0xbb)
		return SK_DROP;
	if (byte[2] == 0xcc)
		return SK_DROP;
	if (byte[3] == 0xdd)
		return SK_DROP;
	return SK_PASS;
}

int reuseport_narrow_byte(struct sk_reuseport_md * ctx):
; int reuseport_narrow_byte(struct sk_reuseport_md *ctx)
     0: (b4) w0 = 0
; if (byte[0] == 0xaa)
     1: (79) r2 = *(u64 *)(r1 +8)
     2: (69) r2 = *(u16 *)(r2 +924)
     3: (54) w2 &= 255
; if (byte[0] == 0xaa)
     4: (16) if w2 == 0xaa goto pc+17
; if (byte[1] == 0xbb)
     5: (79) r2 = *(u64 *)(r1 +8)
     6: (69) r2 = *(u16 *)(r2 +924)
     7: (74) w2 >>= 8
     8: (54) w2 &= 255
; if (byte[1] == 0xbb)
     9: (16) if w2 == 0xbb goto pc+12
; if (byte[2] == 0xcc)
    10: (79) r2 = *(u64 *)(r1 +8)
    11: (69) r2 = *(u16 *)(r2 +924)
    12: (74) w2 >>= 16
    13: (54) w2 &= 255
; if (byte[2] == 0xcc)
    14: (16) if w2 == 0xcc goto pc+7
; if (byte[3] == 0xdd)
    15: (79) r1 = *(u64 *)(r1 +8)
    16: (69) r1 = *(u16 *)(r1 +924)
    17: (74) w1 >>= 24
    18: (54) w1 &= 255
    19: (b4) w0 = 1
; if (byte[3] == 0xdd)
    20: (56) if w1 != 0xdd goto pc+1
    21: (b4) w0 = 0
; }
    22: (95) exit


        7: (74) w1 >>= 16

w1 will be 0 now. so this will work.

        8: (54) w1 &= 65535

For the above insns 5-8, verifier, based on target information can
directly generate w1 = 0 since:
    . target kernel field size is 2, ctx field size is 4.
    . user tries to access offset 2 size 2.

Here, we need to decide whether we permits user to do partial read beyond of
kernel narrow field or not (e.g., this example)? I would
say yes, but Daniel or Alexei can provide additional comments.

If we allow such accesses, I would like verifier to generate better
code as I illustrated in the above. This can be implemented in
verifier itself with target passing additional kernel field size
to the verifier. The target already passed the ctx field size back
to the verifier.

Keep in mind that the BPF user is writing their code under the
assumption that the context field has 4 bytes. IMHO it's reasonable to
expect that I can load 2 bytes at offset of 2 from a 4 byte field.

Restricting it now to loads below the target field size, which is
unknown to the user, would mean rejecting programs that are working
today. Even if they are getting funny values.

I think implementing what you suggest is doable without major
changes. We have load size, target field size, and context field size at
hand in convert_ctx_accesses(), so it seems like a matter of adding an
'if' branch to handle better the case when we know the end result must
be 0. I'll give it a try.

Sounds good. The target_size is returned in convert_ctx_access(), which
is too late as the verifier already generated load instructions. You need to get
it earlier in is_valid_access().

I have a feeling that I'm not following what you have in mind.

True, target_size is only known after convert_ctx_access generated
instructions. At this point, if we want to optimize the narrow loads
that must return 0, we can pop however many instructions
convert_ctx_access appended to insn_buf and emit BPF_MOV32/64_IMM.

However, it sounds a bit more complex than what I hoped for initially,
so I'm starting to doubt the value. Considering that narrow loads at an
offset that matches or exceeds target field size must be a corner case,
if the current "broken" behavior went unnoticed so far.

I'll need to play with the code and see how it turns out. But for the
moment please consider acking/nacking this one, as a simple way to fix
the issue targeted at 'bpf' branch and stable kernels.

Ack the current patch as it does fix the problem. See below comments
with a slight change to avoid penalize existing common case like
   __u16 proto = ctx->ip_protocol;




But I do want to empahsize that I still think the fix in current form is
correct, or at least not worse than what we have already in place narrow
loads.

I did agree that the fix in this patch is correct. It is just that we
could do better to fix this problem.

I agree with your sentiment. Sorry if I got too defensive there.




        9: (b4) w0 = 1
     ; if (half[1] == 0xbbbb)
       10: (56) if w1 != 0xbbbb goto pc+1
       11: (b4) w0 = 0
     ; }
       12: (95) exit

Fixes: f96da09473b5 ("bpf: simplify narrower ctx access")
Signed-off-by: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx>
---
    kernel/bpf/verifier.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 94cead5a43e5..1c4d0e24a5a2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9760,7 +9760,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
    			return -EINVAL;
    		}
    -		if (is_narrower_load && size < target_size) {
+		if (is_narrower_load || size < target_size) {

Maybe
                if (is_narrower_load &&
                    (size < target_size || (off & (size_default - 1)) 
!= 0)) {

The original patch is any narrow load will do shift/mask, which is not 
good. For narrow load, we only need to shift/mask if
  - size < target_size or
  - the off is not in the field boundary.

I still prefer better xlated codes. But if it is too complex, the above
change is also acceptable. I just do not like generated xlated byte 
codes, it is very easy for people to get confused.


    			u8 shift = bpf_ctx_narrow_access_offset(
    				off, size, size_default) * 8;
    			if (ctx_field_size <= 4) {