Re: [oss-drivers] Re: [PATCH/RFC v2 bpf-next 05/19] bpf: split read liveness into REG_LIVE_READ64 and REG_LIVE_READ32

Jiong Wang <jiong.wang@xxxxxxxxxxxxx> · Fri, 12 Apr 2019 17:14:47 +0100

On Thu, Apr 11, 2019 at 5:53 PM Jiong Wang <jiong.wang@xxxxxxxxxxxxx> wrote:
>
>
> > On 11 Apr 2019, at 17:44, Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, 11 Apr 2019 07:13:03 +0100, Jiong Wang wrote:
> >>>> @@ -1150,17 +1150,17 @@ static int mark_reg_read(struct bpf_verifier_env *env,
> >>>>                            parent->var_off.value, parent->off);
> >>>>                    return -EFAULT;
> >>>>            }
> >>>> -          if (parent->live & REG_LIVE_READ)
> >>>> +          if ((parent->live & REG_LIVE_READ) == flags)
> >>>>                    /* The parentage chain never changes and
> >>>> -                   * this parent was already marked as LIVE_READ.
> >>>> +                   * this parent was already marked with all read bits.
> >>>
> >>> Do we have to propagate all read bits?  Read64 is strictly stronger
> >>> than read32, as long as read64 is set on the parent we should be good?
> >>
> >> We should be good, but I doubt there is value to differentiate on this in this
> >> kind of HOT function.
> >
> > The entire if clause is an optimization.  I'm saying you can maintain it
> > as more aggressive.
>
> What I mean is I suspect the read width could be quite consistent in real program,
> the percentage for doing extra check on read64 could actually be mishit for
> most of the time, if the propagation iterations is big the extra check done each
> time may overcome the propagation pruned.
>
> I will do some benchmarking on this to see the real gain.

Take bpf_lxc.o for example, it has ~3 million mark_reg_read
propagation iteration.

Adding extra (parent->live & REG_LIVE_READ64) check cuts off 18% ~ 25%
propagation iterations in exchange of 75% iterations are doing one more check
(parent->live & REG_LIVE_READ64).

Given the propagation contains several statement, I think we are better off
adding such check (also done some ktime_get_real_ts64 run timing measure, but
the results is not very consistent).

Will add it in v3.

Regards,
Jiong