Re: [PATCH bpf-next] tools: bpftool: add "prog run" subcommand to test-run programs

Quentin Monnet <quentin.monnet@xxxxxxxxxxxxx> · Fri, 5 Jul 2019 17:03:25 +0100

2019-07-05 08:42 UTC-0700 ~ Y Song <ys114321@xxxxxxxxx>
> On Fri, Jul 5, 2019 at 1:21 AM Quentin Monnet
> <quentin.monnet@xxxxxxxxxxxxx> wrote:
>>
>> 2019-07-04 22:49 UTC-0700 ~ Y Song <ys114321@xxxxxxxxx>
>>> On Thu, Jul 4, 2019 at 1:58 AM Quentin Monnet
>>> <quentin.monnet@xxxxxxxxxxxxx> wrote:
>>>>
>>>> Add a new "bpftool prog run" subcommand to run a loaded program on input
>>>> data (and possibly with input context) passed by the user.
>>>>
>>>> Print output data (and output context if relevant) into a file or into
>>>> the console. Print return value and duration for the test run into the
>>>> console.
>>>>
>>>> A "repeat" argument can be passed to run the program several times in a
>>>> row.
>>>>
>>>> The command does not perform any kind of verification based on program
>>>> type (Is this program type allowed to use an input context?) or on data
>>>> consistency (Can I work with empty input data?), this is left to the
>>>> kernel.
>>>>
>>>> Example invocation:
>>>>
>>>>     # perl -e 'print "\x0" x 14' | ./bpftool prog run \
>>>>             pinned /sys/fs/bpf/sample_ret0 \
>>>>             data_in - data_out - repeat 5
>>>>     0000000 0000 0000 0000 0000 0000 0000 0000      | ........ ......
>>>>     Return value: 0, duration (average): 260ns
>>>>
>>>> When one of data_in or ctx_in is "-", bpftool reads from standard input,
>>>> in binary format. Other formats (JSON, hexdump) might be supported (via
>>>> an optional command line keyword like "data_fmt_in") in the future if
>>>> relevant, but this would require doing more parsing in bpftool.
>>>>
>>>> Signed-off-by: Quentin Monnet <quentin.monnet@xxxxxxxxxxxxx>
>>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
>>>> ---
>>
>> [...]
>>
>>>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>>>> index 9b0db5d14e31..8dcbaa0a8ab1 100644
>>>> --- a/tools/bpf/bpftool/prog.c
>>>> +++ b/tools/bpf/bpftool/prog.c
>>>> @@ -15,6 +15,7 @@
>>>>  #include <sys/stat.h>
>>>>
>>>>  #include <linux/err.h>
>>>> +#include <linux/sizes.h>
>>>>
>>>>  #include <bpf.h>
>>>>  #include <btf.h>
>>>> @@ -748,6 +749,344 @@ static int do_detach(int argc, char **argv)
>>>>         return 0;
>>>>  }
>>>>
>>>> +static int check_single_stdin(char *file_in, char *other_file_in)
>>>> +{
>>>> +       if (file_in && other_file_in &&
>>>> +           !strcmp(file_in, "-") && !strcmp(other_file_in, "-")) {
>>>> +               p_err("cannot use standard input for both data_in and ctx_in");
>>>
>>> The error message says data_in and ctx_in.
>>> Maybe the input parameter should be file_data_in and file_ctx_in?
>>
>>
>> Hi Yonghong,
>>
>> It's true those parameters should be file names. But having
>> "file_data_in", "file_data_out", "file_ctx_in" and "file_ctx_out" on a
>> command line seems a bit heavy to me? (And relying on keyword prefixing
>> for typing the command won't help much.)
>>
>> My opinion is that it should be clear from the man page or the "help"
>> command that the parameters are file names. What do you think? I can
>> prefix all four arguments with "file_" if you believe this is better.
> 
> I think you misunderstood my question above.

Totally did, sorry :/.

> The command line parameters are fine.
> I am talking about the function parameter names. Since in the error message,
> the input parameters are referred for data_in and ctx_in
>    p_err("cannot use standard input for both data_in and ctx_in")
> maybe the function signature should be
>   static int check_single_stdin(char *file_data_in, char *file_ctx_in)
> 
> If you are worried that later on the same function can be used in different
> contexts, then alternatively, you can have signature like
>   static int check_single_stdin(char *file_in, char *other_file_in,
> const char *file_in_arg, const char *other_file_in_arg)
> where file_in_arg will be passed in as "data_in" and other_file_in_arg
> as "ctx_in".
> I think we could delay this until it is really needed.

As a matter of fact, the opposite thing happened. I first used the
function for data_in/ctx_in, and also for data_out/ctx_out. But I
changed my mind eventually because there is no real reason not to print
both data_out and ctx_out to stdout if we want to do so. So I updated
the name of the parameters in the error messages, but forgot to change
the arguments for the function. Silly me.

So I totally agree, I'll respin and change the argument names for the
function. And yes, we could also pass the names to print in the error
message, but I agree that this is not needed, and not helpful at the moment.

Thanks for catching this!

>>
>> [...]
>>
>>>> +static int do_run(int argc, char **argv)
>>>> +{
>>>> +       char *data_fname_in = NULL, *data_fname_out = NULL;
>>>> +       char *ctx_fname_in = NULL, *ctx_fname_out = NULL;
>>>> +       struct bpf_prog_test_run_attr test_attr = {0};
>>>> +       const unsigned int default_size = SZ_32K;
>>>> +       void *data_in = NULL, *data_out = NULL;
>>>> +       void *ctx_in = NULL, *ctx_out = NULL;
>>>> +       unsigned int repeat = 1;
>>>> +       int fd, err;
>>>> +
>>>> +       if (!REQ_ARGS(4))
>>>> +               return -1;
>>>> +
>>>> +       fd = prog_parse_fd(&argc, &argv);
>>>> +       if (fd < 0)
>>>> +               return -1;
>>>> +
>>>> +       while (argc) {
>>>> +               if (detect_common_prefix(*argv, "data_in", "data_out",
>>>> +                                        "data_size_out", NULL))
>>>> +                       return -1;
>>>> +               if (detect_common_prefix(*argv, "ctx_in", "ctx_out",
>>>> +                                        "ctx_size_out", NULL))
>>>> +                       return -1;
>>>> +
>>>> +               if (is_prefix(*argv, "data_in")) {
>>>> +                       NEXT_ARG();
>>>> +                       if (!REQ_ARGS(1))
>>>> +                               return -1;
>>>> +
>>>> +                       data_fname_in = GET_ARG();
>>>> +                       if (check_single_stdin(data_fname_in, ctx_fname_in))
>>>> +                               return -1;
>>>> +               } else if (is_prefix(*argv, "data_out")) {
>>>
>>> Here, we all use is_prefix() to match "data_in", "data_out",
>>> "data_size_out" etc.
>>> That means users can use "data_i" instead of "data_in" as below
>>>    ... | ./bpftool prog run id 283 data_i - data_out - repeat 5
>>> is this expected?
>> Yes, this is expected. We use prefix matching as we do pretty much
>> everywhere else in bpftool. It's not as useful here because most of the
>> strings for the names are similar. I agree that typing "data_i" instead
>> of "data_in" brings little advantage, but I see no reason why we should
>> reject prefixing for those keywords. And we accept "data_s" instead of
>> "data_size_out", which is still shorter to type than the complete keyword.
> 
> This makes sense. Thanks for explanation.
> 
> Another question. Currently, you are proposing "./bpftool prog run ...",
> but actually it is just a test_run. Do you think we should rename it
> to "./bpftool prog test_run ..." to make it clear for its intention?

Good question. Hmm. It would make it more explicit that we use the
BPF_PROG_TEST_RUN command, but at the same time, from the point of view
of the user, there is nothing in particular that makes it a test run, is
it? I mean, you provide input data, you get output data and return
value, that makes it a real BPF run somehow, except that it's not on a
packet or anything. Do you think it is ambiguous and people may confuse
it with something like "attach"?

Thanks,
Quentin