Re: [PATCH bpf-next v4 1/6] bpf: Factorize bpf_trace_printk and bpf_seq_printf

Florent Revest <revest@xxxxxxxxxxxx> · Thu, 15 Apr 2021 11:32:51 +0200

On Thu, Apr 15, 2021 at 2:38 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
> On Wed, Apr 14, 2021 at 11:54 AM Florent Revest <revest@xxxxxxxxxxxx> wrote:
> > +static int try_get_fmt_tmp_buf(char **tmp_buf)
> > +{
> > +       struct bpf_printf_buf *bufs;
> > +       int used;
> > +
> > +       if (*tmp_buf)
> > +               return 0;
> > +
> > +       preempt_disable();
> > +       used = this_cpu_inc_return(bpf_printf_buf_used);
> > +       if (WARN_ON_ONCE(used > 1)) {
> > +               this_cpu_dec(bpf_printf_buf_used);
>
> this makes me uncomfortable. If used > 1, you won't preempt_enable()
> here, but you'll decrease count. Then later bpf_printf_cleanup() will
> be called (inside bpf_printf_prepare()) and will further decrease
> count (which it didn't increase, so it's a mess now).

Awkward, yes. :( This code is untested because it only covers a niche
preempt_rt usecase that is hard to reproduce but I should have thought
harder about these corner cases.

> > +                       i += 2;
> > +                       if (!final_args)
> > +                               goto fmt_next;
> > +
> > +                       if (try_get_fmt_tmp_buf(&tmp_buf)) {
> > +                               err = -EBUSY;
> > +                               goto out;
>
> this probably should bypass doing bpf_printf_cleanup() and
> try_get_fmt_tmp_buf() should enable preemption internally on error.

Yes. I'll fix this and spend some more brain cycles thinking about
what I'm doing. ;)

> > -static __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...)
> > +BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
> > +          u64, arg2, u64, arg3)
> >  {
> > +       u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 };
> > +       enum bpf_printf_mod_type mod[MAX_TRACE_PRINTK_VARARGS];
> >         static char buf[BPF_TRACE_PRINTK_SIZE];
> >         unsigned long flags;
> > -       va_list ap;
> >         int ret;
> >
> > -       raw_spin_lock_irqsave(&trace_printk_lock, flags);
> > -       va_start(ap, fmt);
> > -       ret = vsnprintf(buf, sizeof(buf), fmt, ap);
> > -       va_end(ap);
> > -       /* vsnprintf() will not append null for zero-length strings */
> > +       ret = bpf_printf_prepare(fmt, fmt_size, args, args, mod,
> > +                                MAX_TRACE_PRINTK_VARARGS);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, mod),
> > +               BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod));
> > +       /* snprintf() will not append null for zero-length strings */
> >         if (ret == 0)
> >                 buf[0] = '\0';
> > +
> > +       raw_spin_lock_irqsave(&trace_printk_lock, flags);
> >         trace_bpf_trace_printk(buf);
> >         raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
> >
> > -       return ret;
>
> see here, no + 1 :(

I wonder if it's a bug or a feature though. The helper documentation
says the helper returns "the number of bytes written to the buffer". I
am not familiar with the internals of trace_printk but if the
terminating \0 is not outputted in the trace_printk buffer, then it
kind of makes sense.

Also, if anyone uses this return value, I can imagine that the usecase
would be if (ret == 0) assume_nothing_was_written(). And if we
suddenly output 1 here, we might break something.

Because the helper is quite old, maybe we should improve the helper
documentation instead? Your call :)