Re: [PATCH bpf-next v4 2/2] selftests/bpf: Add mptcp subflow subtest

Matthieu Baerts <matttbe@xxxxxxxxxx> · Thu, 15 Aug 2024 22:57:05 +0200

Hi Martin,

Thank you for your reply!

On 15/08/2024 00:37, Martin KaFai Lau wrote:
> On 8/14/24 3:04 AM, Matthieu Baerts wrote:
>> Hi Martin,
>>
>> Thank you for your reply!
>>
>> On 14/08/2024 03:12, Martin KaFai Lau wrote:
>>> On 8/5/24 2:52 AM, Matthieu Baerts (NGI0) wrote:
>>>> +static int endpoint_init(char *flags)
>>>> +{
>>>> +    SYS(fail, "ip -net %s link add veth1 type veth peer name veth2",
>>>> NS_TEST);
>>>> +    SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1);
>>>> +    SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST);
>>>> +    SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2);
>>>> +    SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST);
>>>> +    if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST,
>>>> ADDR_2, flags)) {
>>>> +        printf("'ip mptcp' not supported, skip this test.\n");
>>>> +        test__skip();
>>>
>>> It is always a skip now in bpf CI:
>>>
>>> #171/3   mptcp/subflow:SKIP
>>>
>>> This test is a useful addition for the bpf CI selftest.
>>>
>>> It can't catch regression if it is always a skip in bpf CI though.
>>
>> Indeed, for the moment, this test is skipped in bpf CI.
>>
>> The MPTCP CI checks the MPTCP BPF selftests that are on top of net and
>> net-next at least once a day. It is always running with the last stable
>> version of iproute2, so this test is not skipped:
>>
>>     #169/3   mptcp/subflow:OK
>>
>> https://github.com/multipath-tcp/mptcp_net-next/actions/
>> runs/10384566794/job/28751869426#step:7:11080
>>
>>> iproute2 needs to be updated (cc: Daniel Xu and Manu, the outdated
>>> iproute2 is something that came up multiple times).
>>>
>>> Not sure when the iproute2 can be updated. In the mean time, your v3 is
>>> pretty close to getting pm_nl_ctl compiled. Is there other blocker on
>>> this?
>>
>> I will try to find some time to check the modifications I suggested in
>> the v3, but I don't know how long it will take to have them ready, as
>> they might require some adaptations of the CI side as well, I need to
>> check. On the other hand, I understood adding a duplicated version of
>> the mptcp.h UAPI header is not an option either.
>>
>> So not to block this (already old) series, I thought it would help to
>> first focus on this version using 'ip mptcp', while I'm looking at the
>> selftests modifications. If these modifications are successful, I can
>> always resend the patch 2/3 from the v3 later, and using 'pm_nl_ctl'
>> instead of 'ip mptcp', to be able to work with IPRoute2 5.5.
>>
>> Do you think that could work like that?
> 
> If there is CI started covering it, staying with the 'ip mptcp' is fine.
> 
> The bpf CI has to start testing it asap also. The iproute2 package will
> need to be updated on the bpf CI side. I think this has to be done
> regardless.
> 
> It will be useful to avoid the uapi header dup on its own. The last one
> you have seems pretty close.

Thank you. Yes I will try to find time to look at that.

>>>> +        goto fail;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +fail:
>>>> +    return -1;
>>>> +}
>>>> +
>>>> +static int _ss_search(char *src, char *dst, char *port, char *keyword)
>>>> +{
>>>> +    return SYS_NOFAIL("ip netns exec %s ss -enita src %s dst %s %s %d
>>>> | grep -q '%s'",
>>>> +              NS_TEST, src, dst, port, PORT_1, keyword);
>>>> +}
>>>> +
>>>> +static int ss_search(char *src, char *keyword)
>>>> +{
>>>> +    return _ss_search(src, ADDR_1, "dport", keyword);
>>>> +}
>>>> +
>>>> +static void run_subflow(char *new)
>>>> +{
>>>> +    int server_fd, client_fd, err;
>>>> +    char cc[TCP_CA_NAME_MAX];
>>>> +    socklen_t len = sizeof(cc);
>>>> +
>>>> +    server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0);
>>>> +    if (!ASSERT_GE(server_fd, 0, "start_mptcp_server"))
>>>> +        return;
>>>> +
>>>> +    client_fd = connect_to_fd(server_fd, 0);
>>>> +    if (!ASSERT_GE(client_fd, 0, "connect to fd"))
>>>> +        goto fail;
>>>> +
>>>> +    err = getsockopt(server_fd, SOL_TCP, TCP_CONGESTION, cc, &len);
>>>> +    if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)"))
>>>> +        goto fail;
>>>> +
>>>> +    send_byte(client_fd);
>>>> +
>>>> +    ASSERT_OK(ss_search(ADDR_1, "fwmark:0x1"), "ss_search
>>>> fwmark:0x1");
>>>> +    ASSERT_OK(ss_search(ADDR_2, "fwmark:0x2"), "ss_search
>>>> fwmark:0x2");
>>>> +    ASSERT_OK(ss_search(ADDR_1, new), "ss_search new cc");
>>>> +    ASSERT_OK(ss_search(ADDR_2, cc), "ss_search default cc");
>>>
>>> Is there a getsockopt way instead of ss + grep?
>>
>> No there isn't: from the userspace, the app communicates with the MPTCP
>> socket, which can have multiple paths (subflows, a TCP socket). To keep
>> the compatibility with TCP, [gs]etsockopt() will look at/modify the
>> whole MPTCP connection. For example, in some cases, a setsockopt() will
>> propagate the option to all the subflows. Depending on the option, the
>> modification might only apply to the first subflow, or to the
>> user-facing socket.
>>
>> For advanced users who want to have different options set to the
>> different subflows of an MPTCP connection, they can use BPF: that's what
>> is being validated here. In other words, doing a 'getsockopt()' from the
>> userspace program here will not show all the different marks and TCP CC
>> that can be set per subflow with BPF. We can see that in the test: a
>> getsockopt() is done on the MPTCP socket to retrieve the default TCP CC
>> ('cc' which is certainly 'cubic'), but we expect to find another one
>> ('new' which is 'reno'), set by the BPF program from patch 1/2. I guess
>> we could use bpf to do a getsockopt() per subflow, but that's seems a
>> bit cheated to have the BPF test program setting something and checking
>> if it is set. Here, it is an external way. Because it is done from a
> 
> I think the result is valid by having a bpf prog to inspect the value of
> a sock. Inspecting socket is an existing use case. There are many
> existing bpf tests covering this inspection use case to ensure the
> result is legit. A separate cgroup/getsockopt program should help here
> (more on this below).

I didn't consider a separate program. Indeed, should work.

>> dedicated netns, it sounds OK to do that, no?
> 
> Thanks for the explanation. I was hoping there is a way to get to the
> underlying subflow fd. It seems impossible.
> 
> In the netns does help here. It is not only about the ss iterating a lot
> of connections or not. My preference is not depending on external tool/
> shell-ing if possible, e.g. to avoid the package update discussion like
> the iproute2 here. The uapi from the testing kernel is always up-to-
> date. ss is another binary but arguably in the same iproute2 package.
> There is now another extra "grep" and pipe here. We had been bitten by
> different shell behaviors and some arch has different shells ...etc.

OK, I thought it was fine to use 'ss | grep' because it is used in other
BPF selftests: test_tc_tunnel.sh & test_xdp_features.sh.

> I think it is ok to take this set as is if you (and Gelang?) are ok to
> followup a "cgroup/getsockopt" way to inspect the subflow as the very
> next patch to the mptcp selftest. It seems inspecting subflow will be a
> common test going forward for mptcp, so it will be beneficial to have a
> "cgroup/getsockopt" way to inspect the subflow directly.
> 
> Take a look at a recent example [0]. The mptcp test is under a cgroup
> already and has the cgroup setup. An extra "cgroup/getsockopt" prog
> should be enough. That prog can walk the msk->conn_list and use
> bpf_rdonly_cast (or the bpf_core_cast macro in libbpf) to cast a pointer
> to tcp_sock for readonly. It will allow to inspect all the fields in a
> tcp_sock.

It looks interesting to be able to inspect all the fields in a tcp_sock!
I will check with Geliang what can be done.

> Something needs to a fix in patch 2(replied separately), so a re-spin is
> needed.

Thank you. Will do in the next version!

> pw-bot: cr
> 
> [0]: https://lore.kernel.org/all/20240808150558.1035626-3-
> alan.maguire@xxxxxxxxxx/

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.