Re: [PATCH bpf-next v4 1/6] bpf: Factorize bpf_trace_printk and bpf_seq_printf

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 15 Apr 2021 15:37:55 -0700

On Thu, Apr 15, 2021 at 2:33 AM Florent Revest <revest@xxxxxxxxxxxx> wrote:
>
> On Thu, Apr 15, 2021 at 2:38 AM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> > On Wed, Apr 14, 2021 at 11:54 AM Florent Revest <revest@xxxxxxxxxxxx> wrote:
> > > +static int try_get_fmt_tmp_buf(char **tmp_buf)
> > > +{
> > > +       struct bpf_printf_buf *bufs;
> > > +       int used;
> > > +
> > > +       if (*tmp_buf)
> > > +               return 0;
> > > +
> > > +       preempt_disable();
> > > +       used = this_cpu_inc_return(bpf_printf_buf_used);
> > > +       if (WARN_ON_ONCE(used > 1)) {
> > > +               this_cpu_dec(bpf_printf_buf_used);
> >
> > this makes me uncomfortable. If used > 1, you won't preempt_enable()
> > here, but you'll decrease count. Then later bpf_printf_cleanup() will
> > be called (inside bpf_printf_prepare()) and will further decrease
> > count (which it didn't increase, so it's a mess now).
>
> Awkward, yes. :( This code is untested because it only covers a niche
> preempt_rt usecase that is hard to reproduce but I should have thought
> harder about these corner cases.
>
> > > +                       i += 2;
> > > +                       if (!final_args)
> > > +                               goto fmt_next;
> > > +
> > > +                       if (try_get_fmt_tmp_buf(&tmp_buf)) {
> > > +                               err = -EBUSY;
> > > +                               goto out;
> >
> > this probably should bypass doing bpf_printf_cleanup() and
> > try_get_fmt_tmp_buf() should enable preemption internally on error.
>
> Yes. I'll fix this and spend some more brain cycles thinking about
> what I'm doing. ;)
>
> > > -static __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...)
> > > +BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
> > > +          u64, arg2, u64, arg3)
> > >  {
> > > +       u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 };
> > > +       enum bpf_printf_mod_type mod[MAX_TRACE_PRINTK_VARARGS];
> > >         static char buf[BPF_TRACE_PRINTK_SIZE];
> > >         unsigned long flags;
> > > -       va_list ap;
> > >         int ret;
> > >
> > > -       raw_spin_lock_irqsave(&trace_printk_lock, flags);
> > > -       va_start(ap, fmt);
> > > -       ret = vsnprintf(buf, sizeof(buf), fmt, ap);
> > > -       va_end(ap);
> > > -       /* vsnprintf() will not append null for zero-length strings */
> > > +       ret = bpf_printf_prepare(fmt, fmt_size, args, args, mod,
> > > +                                MAX_TRACE_PRINTK_VARARGS);
> > > +       if (ret < 0)
> > > +               return ret;
> > > +
> > > +       ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, mod),
> > > +               BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod));
> > > +       /* snprintf() will not append null for zero-length strings */
> > >         if (ret == 0)
> > >                 buf[0] = '\0';
> > > +
> > > +       raw_spin_lock_irqsave(&trace_printk_lock, flags);
> > >         trace_bpf_trace_printk(buf);
> > >         raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
> > >
> > > -       return ret;
> >
> > see here, no + 1 :(
>
> I wonder if it's a bug or a feature though. The helper documentation
> says the helper returns "the number of bytes written to the buffer". I
> am not familiar with the internals of trace_printk but if the
> terminating \0 is not outputted in the trace_printk buffer, then it
> kind of makes sense.
>
> Also, if anyone uses this return value, I can imagine that the usecase
> would be if (ret == 0) assume_nothing_was_written(). And if we
> suddenly output 1 here, we might break something.
>
> Because the helper is quite old, maybe we should improve the helper
> documentation instead? Your call :)

Yeah, let's make helper's doc a bit more precise, otherwise let's not
touch it. I doubt many users ever check return result of
bpf_trace_printk() at all, tbh.