Re: [PATCH bpf-next] tools: bpftool: add "prog run" subcommand to test-run programs

Quentin Monnet <quentin.monnet@xxxxxxxxxxxxx> · Fri, 5 Jul 2019 18:50:19 +0100

2019-07-05 10:08 UTC-0700 ~ Y Song <ys114321@xxxxxxxxx>
> On Fri, Jul 5, 2019 at 9:03 AM Quentin Monnet
> <quentin.monnet@xxxxxxxxxxxxx> wrote:
>>
>> 2019-07-05 08:42 UTC-0700 ~ Y Song <ys114321@xxxxxxxxx>
>>> On Fri, Jul 5, 2019 at 1:21 AM Quentin Monnet
>>> <quentin.monnet@xxxxxxxxxxxxx> wrote:
>>>>
>>>> 2019-07-04 22:49 UTC-0700 ~ Y Song <ys114321@xxxxxxxxx>
>>>>> On Thu, Jul 4, 2019 at 1:58 AM Quentin Monnet
>>>>> <quentin.monnet@xxxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> Add a new "bpftool prog run" subcommand to run a loaded program on input
>>>>>> data (and possibly with input context) passed by the user.
>>>>>>
>>>>>> Print output data (and output context if relevant) into a file or into
>>>>>> the console. Print return value and duration for the test run into the
>>>>>> console.
>>>>>>
>>>>>> A "repeat" argument can be passed to run the program several times in a
>>>>>> row.
>>>>>>
>>>>>> The command does not perform any kind of verification based on program
>>>>>> type (Is this program type allowed to use an input context?) or on data
>>>>>> consistency (Can I work with empty input data?), this is left to the
>>>>>> kernel.
>>>>>>
>>>>>> Example invocation:
>>>>>>
>>>>>>     # perl -e 'print "\x0" x 14' | ./bpftool prog run \
>>>>>>             pinned /sys/fs/bpf/sample_ret0 \
>>>>>>             data_in - data_out - repeat 5
>>>>>>     0000000 0000 0000 0000 0000 0000 0000 0000      | ........ ......
>>>>>>     Return value: 0, duration (average): 260ns
>>>>>>
>>>>>> When one of data_in or ctx_in is "-", bpftool reads from standard input,
>>>>>> in binary format. Other formats (JSON, hexdump) might be supported (via
>>>>>> an optional command line keyword like "data_fmt_in") in the future if
>>>>>> relevant, but this would require doing more parsing in bpftool.
>>>>>>
>>>>>> Signed-off-by: Quentin Monnet <quentin.monnet@xxxxxxxxxxxxx>
>>>>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
>>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>>>>>> index 9b0db5d14e31..8dcbaa0a8ab1 100644
>>>>>> --- a/tools/bpf/bpftool/prog.c
>>>>>> +++ b/tools/bpf/bpftool/prog.c
>>>>>> @@ -15,6 +15,7 @@
>>>>>>  #include <sys/stat.h>
>>>>>>
>>>>>>  #include <linux/err.h>
>>>>>> +#include <linux/sizes.h>
>>>>>>
>>>>>>  #include <bpf.h>
>>>>>>  #include <btf.h>
>>>>>> @@ -748,6 +749,344 @@ static int do_detach(int argc, char **argv)
>>>>>>         return 0;
>>>>>>  }
>>>>>>
>>>>>> +static int check_single_stdin(char *file_in, char *other_file_in)
>>>>>> +{
>>>>>> +       if (file_in && other_file_in &&
>>>>>> +           !strcmp(file_in, "-") && !strcmp(other_file_in, "-")) {
>>>>>> +               p_err("cannot use standard input for both data_in and ctx_in");
>>>>>
>>>>> The error message says data_in and ctx_in.
>>>>> Maybe the input parameter should be file_data_in and file_ctx_in?
>>>>
>>>>
>>>> Hi Yonghong,
>>>>
>>>> It's true those parameters should be file names. But having
>>>> "file_data_in", "file_data_out", "file_ctx_in" and "file_ctx_out" on a
>>>> command line seems a bit heavy to me? (And relying on keyword prefixing
>>>> for typing the command won't help much.)
>>>>
>>>> My opinion is that it should be clear from the man page or the "help"
>>>> command that the parameters are file names. What do you think? I can
>>>> prefix all four arguments with "file_" if you believe this is better.
>>>
>>> I think you misunderstood my question above.
>>
>> Totally did, sorry :/.
>>
>>> The command line parameters are fine.
>>> I am talking about the function parameter names. Since in the error message,
>>> the input parameters are referred for data_in and ctx_in
>>>    p_err("cannot use standard input for both data_in and ctx_in")
>>> maybe the function signature should be
>>>   static int check_single_stdin(char *file_data_in, char *file_ctx_in)
>>>
>>> If you are worried that later on the same function can be used in different
>>> contexts, then alternatively, you can have signature like
>>>   static int check_single_stdin(char *file_in, char *other_file_in,
>>> const char *file_in_arg, const char *other_file_in_arg)
>>> where file_in_arg will be passed in as "data_in" and other_file_in_arg
>>> as "ctx_in".
>>> I think we could delay this until it is really needed.
>>
>> As a matter of fact, the opposite thing happened. I first used the
>> function for data_in/ctx_in, and also for data_out/ctx_out. But I
>> changed my mind eventually because there is no real reason not to print
>> both data_out and ctx_out to stdout if we want to do so. So I updated
>> the name of the parameters in the error messages, but forgot to change
>> the arguments for the function. Silly me.
>>
>> So I totally agree, I'll respin and change the argument names for the
>> function. And yes, we could also pass the names to print in the error
>> message, but I agree that this is not needed, and not helpful at the moment.
>>
>> Thanks for catching this!
>>
>>>>
>>>> [...]
>>>>
>>>>>> +static int do_run(int argc, char **argv)
>>>>>> +{
>>>>>> +       char *data_fname_in = NULL, *data_fname_out = NULL;
>>>>>> +       char *ctx_fname_in = NULL, *ctx_fname_out = NULL;
>>>>>> +       struct bpf_prog_test_run_attr test_attr = {0};
>>>>>> +       const unsigned int default_size = SZ_32K;
>>>>>> +       void *data_in = NULL, *data_out = NULL;
>>>>>> +       void *ctx_in = NULL, *ctx_out = NULL;
>>>>>> +       unsigned int repeat = 1;
>>>>>> +       int fd, err;
>>>>>> +
>>>>>> +       if (!REQ_ARGS(4))
>>>>>> +               return -1;
>>>>>> +
>>>>>> +       fd = prog_parse_fd(&argc, &argv);
>>>>>> +       if (fd < 0)
>>>>>> +               return -1;
>>>>>> +
>>>>>> +       while (argc) {
>>>>>> +               if (detect_common_prefix(*argv, "data_in", "data_out",
>>>>>> +                                        "data_size_out", NULL))
>>>>>> +                       return -1;
>>>>>> +               if (detect_common_prefix(*argv, "ctx_in", "ctx_out",
>>>>>> +                                        "ctx_size_out", NULL))
>>>>>> +                       return -1;
>>>>>> +
>>>>>> +               if (is_prefix(*argv, "data_in")) {
>>>>>> +                       NEXT_ARG();
>>>>>> +                       if (!REQ_ARGS(1))
>>>>>> +                               return -1;
>>>>>> +
>>>>>> +                       data_fname_in = GET_ARG();
>>>>>> +                       if (check_single_stdin(data_fname_in, ctx_fname_in))
>>>>>> +                               return -1;
>>>>>> +               } else if (is_prefix(*argv, "data_out")) {
>>>>>
>>>>> Here, we all use is_prefix() to match "data_in", "data_out",
>>>>> "data_size_out" etc.
>>>>> That means users can use "data_i" instead of "data_in" as below
>>>>>    ... | ./bpftool prog run id 283 data_i - data_out - repeat 5
>>>>> is this expected?
>>>> Yes, this is expected. We use prefix matching as we do pretty much
>>>> everywhere else in bpftool. It's not as useful here because most of the
>>>> strings for the names are similar. I agree that typing "data_i" instead
>>>> of "data_in" brings little advantage, but I see no reason why we should
>>>> reject prefixing for those keywords. And we accept "data_s" instead of
>>>> "data_size_out", which is still shorter to type than the complete keyword.
>>>
>>> This makes sense. Thanks for explanation.
>>>
>>> Another question. Currently, you are proposing "./bpftool prog run ...",
>>> but actually it is just a test_run. Do you think we should rename it
>>> to "./bpftool prog test_run ..." to make it clear for its intention?
>>
>> Good question. Hmm. It would make it more explicit that we use the
>> BPF_PROG_TEST_RUN command, but at the same time, from the point of view
>> of the user, there is nothing in particular that makes it a test run, is
>> it? I mean, you provide input data, you get output data and return
>> value, that makes it a real BPF run somehow, except that it's not on a
>> packet or anything. Do you think it is ambiguous and people may confuse
>> it with something like "attach"?
> 
> I am more thinking about whether we could have a real "bpftool prog run ..."
> in the future which could really run the program in some kind of production
> environment...
> 
> But I could be wrong since after "bpf prog attach" it may already just start
> to run in production, so "bpf prog run ..." not really needed for it.
> 
> So "bpf prog run ..." is probably fine.

Ok, I'll stick to "prog run" unless someone else comments, then. I
suppose we can find something else, like "bpftool prog start", if we
need something like the feature you describe someday.

I'll send a v2 with the fix for the arguments in check_single_stdin().

Thanks,
Quentin