Re: [PATCH bpf-next v2 1/4] bpf: verify scalar ids mapping in regsafe() using check_ids()

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 31 May 2023 11:29:52 -0700

On Wed, May 31, 2023 at 10:21 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Tue, 2023-05-30 at 14:37 -0700, Andrii Nakryiko wrote:
> [...]
> > Also, it might make sense to drop SCALAR register IDs as soon as we
> > have only one instance of it left (e.g., if "paired" register was
> > overwritten already). I.e., aggressively drop IDs when they become
> > useless. WDYT?
>
> I added modification which resets sole scalar IDs to zero before
> states comparison, it shows some speedup but is still slow:
>
>   Filter        | Number of programs | Number of programs
>                 | patch #1           | patch #1 + sole scalar ID pruning
>   ----------------------------------------------------------------------
>   states_pct>10 | 40                 | 40
>   states_pct>20 | 20                 | 19
>   states_pct>30 | 15                 | 13
>   states_pct>40 | 11                 | 8
>
> (Out of 177 programs).
>
> I'll modify mark_chain_precision() to propagate precision marks for
> find_equal_scalars(), so that it could be compared to current patch #3
> in terms of code complexity and verification performance.
>
> If you have any thoughts regarding my previous email, please share.
>

Yep, I do. Given SCALAR registers with the same ID are meant to "share
the destiny", shouldn't it be required that when we mark r6 as precise
we should automatically mark linked r7 as precise at the same point.
So in your example:

7: r9 += r6

should be where we request both r6 and r7 (and whatever other register
has the same ID) to be marked as precise. It should be very easy to
implement, especially given my recent refactoring with
mark_chain_precision_batch.

The question I have (and again, haven't spent any time thinking about
any other corner cases, sorry) is whether that alone would be a proper
fix?

As for this u32_hashset, it just feels like a big overkill, tbh. If we
have to do something like that, wouldn't it be better to, say, set
highest bit in reg->id (for all linked registers, of course) to mark
it as "used for range checks" instead of maintaining a separate check?

But just the whole idea of keeping track of some special circumstances
under which IDs are meaningful seems wrong... All this logic is
complicated, now we are adding another layer of complexity on top. And
the complexity is not in the code, it's in thinking about all possible
scenarios and their interactions.

> [...]