Hi Martin, Thank you for your reply! On 15/08/2024 00:37, Martin KaFai Lau wrote: > On 8/14/24 3:04 AM, Matthieu Baerts wrote: >> Hi Martin, >> >> Thank you for your reply! >> >> On 14/08/2024 03:12, Martin KaFai Lau wrote: >>> On 8/5/24 2:52 AM, Matthieu Baerts (NGI0) wrote: >>>> +static int endpoint_init(char *flags) >>>> +{ >>>> + SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", >>>> NS_TEST); >>>> + SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1); >>>> + SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST); >>>> + SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2); >>>> + SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST); >>>> + if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, >>>> ADDR_2, flags)) { >>>> + printf("'ip mptcp' not supported, skip this test.\n"); >>>> + test__skip(); >>> >>> It is always a skip now in bpf CI: >>> >>> #171/3 mptcp/subflow:SKIP >>> >>> This test is a useful addition for the bpf CI selftest. >>> >>> It can't catch regression if it is always a skip in bpf CI though. >> >> Indeed, for the moment, this test is skipped in bpf CI. >> >> The MPTCP CI checks the MPTCP BPF selftests that are on top of net and >> net-next at least once a day. It is always running with the last stable >> version of iproute2, so this test is not skipped: >> >> #169/3 mptcp/subflow:OK >> >> https://github.com/multipath-tcp/mptcp_net-next/actions/ >> runs/10384566794/job/28751869426#step:7:11080 >> >>> iproute2 needs to be updated (cc: Daniel Xu and Manu, the outdated >>> iproute2 is something that came up multiple times). >>> >>> Not sure when the iproute2 can be updated. In the mean time, your v3 is >>> pretty close to getting pm_nl_ctl compiled. Is there other blocker on >>> this? >> >> I will try to find some time to check the modifications I suggested in >> the v3, but I don't know how long it will take to have them ready, as >> they might require some adaptations of the CI side as well, I need to >> check. On the other hand, I understood adding a duplicated version of >> the mptcp.h UAPI header is not an option either. >> >> So not to block this (already old) series, I thought it would help to >> first focus on this version using 'ip mptcp', while I'm looking at the >> selftests modifications. If these modifications are successful, I can >> always resend the patch 2/3 from the v3 later, and using 'pm_nl_ctl' >> instead of 'ip mptcp', to be able to work with IPRoute2 5.5. >> >> Do you think that could work like that? > > If there is CI started covering it, staying with the 'ip mptcp' is fine. > > The bpf CI has to start testing it asap also. The iproute2 package will > need to be updated on the bpf CI side. I think this has to be done > regardless. > > It will be useful to avoid the uapi header dup on its own. The last one > you have seems pretty close. Thank you. Yes I will try to find time to look at that. >>>> + goto fail; >>>> + } >>>> + >>>> + return 0; >>>> +fail: >>>> + return -1; >>>> +} >>>> + >>>> +static int _ss_search(char *src, char *dst, char *port, char *keyword) >>>> +{ >>>> + return SYS_NOFAIL("ip netns exec %s ss -enita src %s dst %s %s %d >>>> | grep -q '%s'", >>>> + NS_TEST, src, dst, port, PORT_1, keyword); >>>> +} >>>> + >>>> +static int ss_search(char *src, char *keyword) >>>> +{ >>>> + return _ss_search(src, ADDR_1, "dport", keyword); >>>> +} >>>> + >>>> +static void run_subflow(char *new) >>>> +{ >>>> + int server_fd, client_fd, err; >>>> + char cc[TCP_CA_NAME_MAX]; >>>> + socklen_t len = sizeof(cc); >>>> + >>>> + server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0); >>>> + if (!ASSERT_GE(server_fd, 0, "start_mptcp_server")) >>>> + return; >>>> + >>>> + client_fd = connect_to_fd(server_fd, 0); >>>> + if (!ASSERT_GE(client_fd, 0, "connect to fd")) >>>> + goto fail; >>>> + >>>> + err = getsockopt(server_fd, SOL_TCP, TCP_CONGESTION, cc, &len); >>>> + if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)")) >>>> + goto fail; >>>> + >>>> + send_byte(client_fd); >>>> + >>>> + ASSERT_OK(ss_search(ADDR_1, "fwmark:0x1"), "ss_search >>>> fwmark:0x1"); >>>> + ASSERT_OK(ss_search(ADDR_2, "fwmark:0x2"), "ss_search >>>> fwmark:0x2"); >>>> + ASSERT_OK(ss_search(ADDR_1, new), "ss_search new cc"); >>>> + ASSERT_OK(ss_search(ADDR_2, cc), "ss_search default cc"); >>> >>> Is there a getsockopt way instead of ss + grep? >> >> No there isn't: from the userspace, the app communicates with the MPTCP >> socket, which can have multiple paths (subflows, a TCP socket). To keep >> the compatibility with TCP, [gs]etsockopt() will look at/modify the >> whole MPTCP connection. For example, in some cases, a setsockopt() will >> propagate the option to all the subflows. Depending on the option, the >> modification might only apply to the first subflow, or to the >> user-facing socket. >> >> For advanced users who want to have different options set to the >> different subflows of an MPTCP connection, they can use BPF: that's what >> is being validated here. In other words, doing a 'getsockopt()' from the >> userspace program here will not show all the different marks and TCP CC >> that can be set per subflow with BPF. We can see that in the test: a >> getsockopt() is done on the MPTCP socket to retrieve the default TCP CC >> ('cc' which is certainly 'cubic'), but we expect to find another one >> ('new' which is 'reno'), set by the BPF program from patch 1/2. I guess >> we could use bpf to do a getsockopt() per subflow, but that's seems a >> bit cheated to have the BPF test program setting something and checking >> if it is set. Here, it is an external way. Because it is done from a > > I think the result is valid by having a bpf prog to inspect the value of > a sock. Inspecting socket is an existing use case. There are many > existing bpf tests covering this inspection use case to ensure the > result is legit. A separate cgroup/getsockopt program should help here > (more on this below). I didn't consider a separate program. Indeed, should work. >> dedicated netns, it sounds OK to do that, no? > > Thanks for the explanation. I was hoping there is a way to get to the > underlying subflow fd. It seems impossible. > > In the netns does help here. It is not only about the ss iterating a lot > of connections or not. My preference is not depending on external tool/ > shell-ing if possible, e.g. to avoid the package update discussion like > the iproute2 here. The uapi from the testing kernel is always up-to- > date. ss is another binary but arguably in the same iproute2 package. > There is now another extra "grep" and pipe here. We had been bitten by > different shell behaviors and some arch has different shells ...etc. OK, I thought it was fine to use 'ss | grep' because it is used in other BPF selftests: test_tc_tunnel.sh & test_xdp_features.sh. > I think it is ok to take this set as is if you (and Gelang?) are ok to > followup a "cgroup/getsockopt" way to inspect the subflow as the very > next patch to the mptcp selftest. It seems inspecting subflow will be a > common test going forward for mptcp, so it will be beneficial to have a > "cgroup/getsockopt" way to inspect the subflow directly. > > Take a look at a recent example [0]. The mptcp test is under a cgroup > already and has the cgroup setup. An extra "cgroup/getsockopt" prog > should be enough. That prog can walk the msk->conn_list and use > bpf_rdonly_cast (or the bpf_core_cast macro in libbpf) to cast a pointer > to tcp_sock for readonly. It will allow to inspect all the fields in a > tcp_sock. It looks interesting to be able to inspect all the fields in a tcp_sock! I will check with Geliang what can be done. > Something needs to a fix in patch 2(replied separately), so a re-spin is > needed. Thank you. Will do in the next version! > pw-bot: cr > > [0]: https://lore.kernel.org/all/20240808150558.1035626-3- > alan.maguire@xxxxxxxxxx/ Cheers, Matt -- Sponsored by the NGI0 Core fund.