On Thu, Oct 29, 2020 at 09:58:15AM -0700, Alexander Duyck wrote: [ ... ] > > > @@ -43,7 +94,9 @@ int verify_result(const struct tcpbpf_globals *result) > > > EXPECT_EQ(0x80, result->bad_cb_test_rv, PRIu32); > > > EXPECT_EQ(0, result->good_cb_test_rv, PRIu32); > > > EXPECT_EQ(1, result->num_listen, PRIu32); > > > - EXPECT_EQ(EXPECTED_CLOSE_EVENTS, result->num_close_events, PRIu32); > > > + > > > + /* 3 comes from one listening socket + both ends of the connection */ > > > + EXPECT_EQ(3, result->num_close_events, PRIu32); > > > > > > return ret; > > > } > > > @@ -67,6 +120,52 @@ int verify_sockopt_result(int sock_map_fd) > > > return ret; > > > } > > > > > > +static int run_test(void) > > > +{ > > > + int server_fd, client_fd; > > > + void *server_err; > > > + char buf[1000]; > > > + pthread_t tid; > > > + int err = -1; > > > + int i; > > > + > > > + server_fd = start_server(AF_INET6, SOCK_STREAM, LO_ADDR6, 0, 0); > > > + if (CHECK_FAIL(server_fd < 0)) > > > + return err; > > > + > > > + pthread_mutex_lock(&server_started_mtx); > > > + if (CHECK_FAIL(pthread_create(&tid, NULL, server_thread, > > > + (void *)&server_fd))) > > > + goto close_server_fd; > > > + > > > + pthread_cond_wait(&server_started, &server_started_mtx); > > > + pthread_mutex_unlock(&server_started_mtx); > > > + > > > + client_fd = connect_to_fd(server_fd, 0); > > > + if (client_fd < 0) > > > + goto close_server_fd; > > > + > > > + for (i = 0; i < 1000; i++) > > > + buf[i] = '+'; > > > + > > > + if (CHECK_FAIL(send(client_fd, buf, 1000, 0) < 1000)) > > > + goto close_client_fd; > > > + > > > + if (CHECK_FAIL(recv(client_fd, buf, 500, 0) < 500)) > > > + goto close_client_fd; > > > + > > > + pthread_join(tid, &server_err); > > I think this can be further simplified without starting thread > > and do everything in run_test() instead. > > > > Something like this (uncompiled code): > > > > accept_fd = accept(server_fd, NULL, 0); > > send(client_fd, plus_buf, 1000, 0); > > recv(accept_fd, recv_buf, 1000, 0); > > send(accept_fd, dot_buf, 500, 0); > > recv(client_fd, recv_buf, 500, 0); > > I can take a look at switching it over. > > > > + > > > + err = (int)(long)server_err; > > > + CHECK_FAIL(err); > > > + > > > +close_client_fd: > > > + close(client_fd); > > > +close_server_fd: > > > + close(server_fd); > > > + return err; > > > +} > > > + > > > void test_tcpbpf_user(void) > > > { > > > const char *file = "test_tcpbpf_kern.o"; > > > @@ -74,7 +173,6 @@ void test_tcpbpf_user(void) > > > struct tcpbpf_globals g = {0}; > > > struct bpf_object *obj; > > > int cg_fd = -1; > > > - int retry = 10; > > > __u32 key = 0; > > > int rv; > > > > > > @@ -94,11 +192,6 @@ void test_tcpbpf_user(void) > > > goto err; > > > } > > > > > > - if (CHECK_FAIL(system("./tcp_server.py"))) { > > > - fprintf(stderr, "FAILED: TCP server\n"); > > > - goto err; > > > - } > > > - > > > map_fd = bpf_find_map(__func__, obj, "global_map"); > > > if (CHECK_FAIL(map_fd < 0)) > > > goto err; > > > @@ -107,21 +200,17 @@ void test_tcpbpf_user(void) > > > if (CHECK_FAIL(sock_map_fd < 0)) > > > goto err; > > > > > > -retry_lookup: > > > + if (run_test()) { > > > + fprintf(stderr, "FAILED: TCP server\n"); > > > + goto err; > > > + } > > > + > > > rv = bpf_map_lookup_elem(map_fd, &key, &g); > > > if (CHECK_FAIL(rv != 0)) { > > CHECK() is a better one here if it needs to output error message. > > The same goes for similar usages in this patch set. > > > > For the start_server() above which has already logged the error message, > > CHECK_FAIL() is good enough. > > > > > fprintf(stderr, "FAILED: bpf_map_lookup_elem returns %d\n", rv); > > > goto err; > > > } > > > > > > - if (g.num_close_events != EXPECTED_CLOSE_EVENTS && retry--) { > > It is good to have a solution to avoid a test depending on some number > > of retries. > > > > After looking at BPF_SOCK_OPS_STATE_CB in test_tcpbpf_kern.c, > > it is not clear to me removing python alone is enough to avoid the > > race (so the retry--). One of the sk might still be in TCP_LAST_ACK > > instead of TCP_CLOSE. > > > > After you pointed this out I decided to go back through and do some > further testing. After testing this for several thousand iterations it > does look like the issue can still happen, it was just significantly > less frequent with the threaded approach, but it was still there. So I > will go back through and add this back and then fold it into the > verify_results function in the third patch. Although I might reduce > the wait times as it seems like with the inline approach we only need > in the 10s of microseconds instead of 100s for the sockets to close > out. I think this retry-and-wait can be avoided. More on this... > > > Also, when looking closer at BPF_SOCK_OPS_STATE_CB in test_tcpbpf_kern.c, > > it seems the map value "gp" is slapped together across multiple > > TCP_CLOSE events which may be not easy to understand. > > > > How about it checks different states: TCP_CLOSE, TCP_LAST_ACK, > > and BPF_TCP_FIN_WAIT2. Each of this state will update its own > > values under "gp". Something like this (only compiler tested on > > top of patch 4): > > > > diff --git i/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c w/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c > > index 7e92c37976ac..65b247b03dfc 100644 > > --- i/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c > > +++ w/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c > > @@ -90,15 +90,14 @@ static void verify_result(int map_fd, int sock_map_fd) > > result.event_map, expected_events); > > > > ASSERT_EQ(result.bytes_received, 501, "bytes_received"); > > - ASSERT_EQ(result.bytes_acked, 1002, "bytes_acked"); > > + ASSERT_EQ(result.bytes_acked, 1001, "bytes_acked"); > > ASSERT_EQ(result.data_segs_in, 1, "data_segs_in"); > > ASSERT_EQ(result.data_segs_out, 1, "data_segs_out"); > > ASSERT_EQ(result.bad_cb_test_rv, 0x80, "bad_cb_test_rv"); > > ASSERT_EQ(result.good_cb_test_rv, 0, "good_cb_test_rv"); > > - ASSERT_EQ(result.num_listen, 1, "num_listen"); > > - > > - /* 3 comes from one listening socket + both ends of the connection */ > > - ASSERT_EQ(result.num_close_events, 3, "num_close_events"); > > + ASSERT_EQ(result.num_listen_close, 1, "num_listen"); > > + ASSERT_EQ(result.num_last_ack, 1, "num_last_ack"); > > + ASSERT_EQ(result.num_fin_wait2, 1, "num_fin_wait2"); > > > > /* check setsockopt for SAVE_SYN */ > > key = 0; > > diff --git i/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c w/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c > > index 3e6912e4df3d..2c5ffb50d6e0 100644 > > --- i/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c > > +++ w/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c > > @@ -55,9 +55,11 @@ int bpf_testcb(struct bpf_sock_ops *skops) > > { > > char header[sizeof(struct ipv6hdr) + sizeof(struct tcphdr)]; > > struct bpf_sock_ops *reuse = skops; > > + struct tcpbpf_globals *gp; > > struct tcphdr *thdr; > > int good_call_rv = 0; > > int bad_call_rv = 0; > > + __u32 key_zero = 0; > > int save_syn = 1; > > int rv = -1; > > int v = 0; > > @@ -155,26 +157,21 @@ int bpf_testcb(struct bpf_sock_ops *skops) > > case BPF_SOCK_OPS_RETRANS_CB: > > break; > > case BPF_SOCK_OPS_STATE_CB: > > - if (skops->args[1] == BPF_TCP_CLOSE) { > > - __u32 key = 0; > > - struct tcpbpf_globals g, *gp; > > - > > - gp = bpf_map_lookup_elem(&global_map, &key); > > - if (!gp) > > - break; > > - g = *gp; > > - if (skops->args[0] == BPF_TCP_LISTEN) { > > - g.num_listen++; > > - } else { > > - g.total_retrans = skops->total_retrans; > > - g.data_segs_in = skops->data_segs_in; > > - g.data_segs_out = skops->data_segs_out; > > - g.bytes_received = skops->bytes_received; > > - g.bytes_acked = skops->bytes_acked; > > - } > > - g.num_close_events++; > > - bpf_map_update_elem(&global_map, &key, &g, > > - BPF_ANY); > > + gp = bpf_map_lookup_elem(&global_map, &key_zero); > > + if (!gp) > > + break; > > + if (skops->args[1] == BPF_TCP_CLOSE && > > + skops->args[0] == BPF_TCP_LISTEN) { > > + gp->num_listen_close++; > > + } else if (skops->args[1] == BPF_TCP_LAST_ACK) { > > + gp->total_retrans = skops->total_retrans; > > + gp->data_segs_in = skops->data_segs_in; > > + gp->data_segs_out = skops->data_segs_out; > > + gp->bytes_received = skops->bytes_received; > > + gp->bytes_acked = skops->bytes_acked; > > + gp->num_last_ack++; > > + } else if (skops->args[1] == BPF_TCP_FIN_WAIT2) { > > + gp->num_fin_wait2++; I meant with the above change in "case BPF_SOCK_OPS_STATE_CB". The retry-and-wait in tcpbpf_user.c can be avoided. What may still be needed in tcpbpf_user.c is to use shutdown and read-zero to ensure the sk has gone through those states before calling verify_result(). Something like this [ uncompiled code again :) ]: /* Always send FIN from accept_fd first to * ensure it will go through FIN_WAIT_2. */ shutdown(accept_fd, SHUT_WR); /* Ensure client_fd gets the FIN */ err = read(client_fd, buf, sizeof(buf)); if (CHECK(err != 0, "read-after-shutdown(client_fd):", "err:%d errno:%d\n", err, errno)) goto close_accept_fd; /* FIN sends from client_fd and it must be in LAST_ACK now */ shutdown(client_fd, SHUT_WR); /* Ensure accept_fd gets the FIN-ACK. * accept_fd must have passed the FIN_WAIT2. */ err = read(accept_fd, buf, sizeof(buf)); if (CHECK(err != 0, "read-after-shutdown(accept_fd):", "err:%d errno:%d\n", err, errno)) goto close_accept_fd; close(server_fd); close(accept_fd); close(client_fd); /* All sk has gone through the states being tested. * check the results now. */ verify_result(map_fd, sock_map_fd); > > } > > break; > > case BPF_SOCK_OPS_TCP_LISTEN_CB: > > diff --git i/tools/testing/selftests/bpf/test_tcpbpf.h w/tools/testing/selftests/bpf/test_tcpbpf.h > > index 6220b95cbd02..0dec324ba4a6 100644 > > --- i/tools/testing/selftests/bpf/test_tcpbpf.h > > +++ w/tools/testing/selftests/bpf/test_tcpbpf.h > > @@ -12,7 +12,8 @@ struct tcpbpf_globals { > > __u32 good_cb_test_rv; > > __u64 bytes_received; > > __u64 bytes_acked; > > - __u32 num_listen; > > - __u32 num_close_events; > > + __u32 num_listen_close; > > + __u32 num_last_ack; > > + __u32 num_fin_wait2; > > }; > > #endif > > I can look at pulling this in and including it as a patch 5 if you > would prefer. If I find any issues I will let you know. > > > I also noticed the bytes_received/acked depends on the order of close(), > > i.e. always close the accepted fd first. I think a comment > > in the tcpbpf_user.c is good enough for now. > > Okay, I can add a comment explaining this. > > > [ It does not have to be in this set and it can be done in another > > follow up effort. > > Instead of using a bpf map to store the result, using global > > variables in test_tcpbpf_kern.c will simplify the code further. ] > > I assume this comment is about the changes to test_tcpbpf_kern.c? Just > want to clarify as I assume this isn't about adding the comment about > the socket closing order affecting the bytes_received/acked. Right, it is unrelated to the "adding the comment about socket closing order". It is about changing test_tcpbpf_kern.c and tcpbpf_user.c to use global variables instead of bpf map to store results. Again, it can be done later. This can be used as an example: b18c1f0aa477 ("bpf: selftest: Adapt sock_fields test to use skel and global variables")