Re: [PATCH bpf-next] bpf: tnums: Provably sound, faster, and more precise algorithm for tnum_mul

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Sat, 29 May 2021 22:59:14 -0700

On Thu, May 27, 2021 at 11:14 PM <hv90@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> From: Harishankar Vishwanathan <harishankar.vishwanathan@xxxxxxxxxxx>
>
> This patch introduces a new algorithm for multiplication of tristate
> numbers (tnums) that is provably sound. It is faster and more precise when
> compared to the existing method.
>
> Like the existing method, this new algorithm follows the long
> multiplication algorithm. The idea is to generate partial products by
> multiplying each bit in the multiplier (tnum a) with the multiplicand
> (tnum b), and adding the partial products after appropriately bit-shifting
> them. The new algorithm, however, uses just a single loop over the bits of
> the multiplier (tnum a) and accumulates only the uncertain components of
> the multiplicand (tnum b) into a mask-only tnum. The following paper
> explains the algorithm in more detail: https://arxiv.org/abs/2105.05398.

This is a nice paper, I appreciated tables with algorithms pseudo-code
and specific examples with uncertain bits, thanks!

I think your algorithm makes sense, but I've also CC'ed original
author of tnum logic. Edward, please take a look as well.

See below mostly styling nits.

>
> A natural way to construct the tnum product is by performing a tnum
> addition on all the partial products. This algorithm presents another
> method of doing this: decompose each partial product into two tnums,
> consisting of the values and the masks separately. The mask-sum is
> accumulated within the loop in acc_m. The value-sum tnum is generated
> using a.value * b.value. The tnum constructed by tnum addition of the
> value-sum and the mask-sum contains all possible summations of concrete
> values drawn from the partial product tnums pairwise. We prove this result
> in the paper.
>
> Our evaluations show that the new algorithm is overall more precise
> (producing tnums with less uncertain components) than the existing method.
> As an illustrative example, consider the input tnums A and B. The numbers
> in the paranthesis correspond to (value;mask).
>
> A                = 000000x1 (1;2)
> B                = 0010011x (38;1)
> A * B (existing) = xxxxxxxx (0;255)
> A * B (new)      = 0x1xxxxx (32;95)
>
> Importantly, we present a proof of soundness of the new algorithm in the
> aforementioned paper. Additionally, we show that this new algorithm is
> empirically faster than the existing method.
>
> Co-developed-by: Matan Shachnai <m.shachnai@xxxxxxxxxxx>
> Signed-off-by: Matan Shachnai <m.shachnai@xxxxxxxxxxx>
> Co-developed-by: Srinivas Narayana <srinivas.narayana@xxxxxxxxxxx>
> Signed-off-by: Srinivas Narayana <srinivas.narayana@xxxxxxxxxxx>
> Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@xxxxxxxxxxx>
> Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@xxxxxxxxxxx>
> Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@xxxxxxxxxxx>
> ---
>  kernel/bpf/tnum.c | 38 +++++++++++++++++++-------------------
>  1 file changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
> index ceac5281bd31..bb1fa1cc181d 100644
> --- a/kernel/bpf/tnum.c
> +++ b/kernel/bpf/tnum.c
> @@ -111,30 +111,30 @@ struct tnum tnum_xor(struct tnum a, struct tnum b)
>         return TNUM(v & ~mu, mu);
>  }
>
> -/* half-multiply add: acc += (unknown * mask * value).
> - * An intermediate step in the multiply algorithm.
> - */
> -static struct tnum hma(struct tnum acc, u64 value, u64 mask)
> +struct tnum tnum_mul(struct tnum a, struct tnum b)

It's probably a good idea to have a short description (from your
commit description above) of the algorithm in the comment above this
function, with a link to your paper.

>  {
> -       while (mask) {
> -               if (mask & 1)
> -                       acc = tnum_add(acc, TNUM(0, value));
> -               mask >>= 1;
> -               value <<= 1;
> -       }
> -       return acc;
> -}
> +       u64 acc_v = a.value * b.value;
> +       struct tnum acc_m = TNUM(0, 0);
>
> -struct tnum tnum_mul(struct tnum a, struct tnum b)
> -{
> -       struct tnum acc;
> -       u64 pi;
> +       while (a.value > 0 || a.mask > 0) {

`while (a.value || a.mask)` is shorter and doesn't imply that a.value
or a.mask can be < 0 (otherwise you'd write != 0, right? ;)

> +

unnecessary empty line

> +               // LSB of tnum a is a certain 1

please use C-style comments /* */

> +               if (((a.value & 1) == 1) && ((a.mask & 1) == 0))

just if (a.value & 1) is enough. if a.value == 1, a.mask has to be 0,
right? and (x & 1) == 1 is just a more verbose way of saying (x & 1)
is non-zero, which in C is just if (x & 1).

> +                       acc_m = tnum_add(acc_m, TNUM(0, b.mask));
>
> -       pi = a.value * b.value;
> -       acc = hma(TNUM(pi, 0), a.mask, b.mask | b.value);
> -       return hma(acc, b.mask, a.value);
> +               // LSB of tnum a is uncertain
> +               else if ((a.mask & 1) == 1)

same about comment style and simpler if statement; also another comment below

> +                       acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask));
> +
> +               // Note: no case for LSB is certain 0
> +               a = tnum_rshift(a, 1);
> +               b = tnum_lshift(b, 1);
> +       }
> +
> +       return tnum_add(TNUM(acc_v, 0), acc_m);
>  }
>
> +

another unnecessary empty line

>  /* Note that if a and b disagree - i.e. one has a 'known 1' where the other has
>   * a 'known 0' - this will return a 'known 1' for that bit.
>   */
> --
> 2.25.1
>