On 2022-12-14 15:16, Victoria Dye wrote: > Matthew John Cheetham via GitGitGadget wrote: >> From: Matthew John Cheetham <mjcheetham@xxxxxxxxxxx> >> >> Introduce a mini HTTP server helper that in the future will be enhanced >> to provide a frontend for the git-http-backend, with support for >> arbitrary authentication schemes. > > I really like this approach, particularly because it opens up the > possibility of writing more fine-grained tests in other contexts (e.g., > testing how a bundle-uri client handles different kinds of erroneous server > responses by intercepting and customizing those responses). Having a mini server we can play around with makes it easier to simulate a 'bad' server, rather than use a real one like Apache and try and coerce it in to doing 'bad' things. >> >> Right now, test-http-server is a pared-down copy of the git-daemon that >> always returns a 501 Not Implemented response to all callers. >> >> Signed-off-by: Matthew John Cheetham <mjcheetham@xxxxxxxxxxx> >> --- >> Makefile | 2 + >> contrib/buildsystems/CMakeLists.txt | 13 + >> t/helper/.gitignore | 1 + >> t/helper/test-http-server.c | 685 ++++++++++++++++++++++++++++ >> 4 files changed, 701 insertions(+) >> create mode 100644 t/helper/test-http-server.c >> >> diff --git a/Makefile b/Makefile >> index b258fdbed86..1eb795bbfd4 100644 >> --- a/Makefile >> +++ b/Makefile >> @@ -1611,6 +1611,8 @@ else >> endif >> BASIC_CFLAGS += $(CURL_CFLAGS) >> >> + TEST_PROGRAMS_NEED_X += test-http-server > > This works because all usage of 'TEST_PROGRAMS_NEED_X' are either lazily > evaluated (in the case of 'TEST_PROGRAMS') or are assigned later in the > 'Makefile' than the addition here (in the case of 'test_bindir_programs'). > > On a related note, I think it would be helpful to mention 'test-http-server' > in the "=== Optional library: libcurl ===" section of the documentation at > the top of the Makefile, to clarify that it (like 'git-http-fetch' and > 'git-http-push') are not built. Upon closer inspection I noticed we don't actuall depend on libcurl here. In my next iteration I've reworked the test helper to share some code with daemon.c and changed where we add `test-http-server` in the Makefiles to be the same as `test-fake-ssh`. >> + >> REMOTE_CURL_PRIMARY = git-remote-http$X >> REMOTE_CURL_ALIASES = git-remote-https$X git-remote-ftp$X git-remote-ftps$X >> REMOTE_CURL_NAMES = $(REMOTE_CURL_PRIMARY) $(REMOTE_CURL_ALIASES) >> diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt >> index 2f6e0197ffa..e9b9bfbb437 100644 >> --- a/contrib/buildsystems/CMakeLists.txt >> +++ b/contrib/buildsystems/CMakeLists.txt >> @@ -989,6 +989,19 @@ set(wrapper_scripts >> set(wrapper_test_scripts >> test-fake-ssh test-tool) >> >> +if(CURL_FOUND) >> + list(APPEND wrapper_test_scripts test-http-server) >> + >> + add_executable(test-http-server ${CMAKE_SOURCE_DIR}/t/helper/test-http-server.c) >> + target_link_libraries(test-http-server common-main) >> + >> + if(MSVC) >> + set_target_properties(test-http-server >> + PROPERTIES RUNTIME_OUTPUT_DIRECTORY_DEBUG ${CMAKE_BINARY_DIR}/t/helper) >> + set_target_properties(test-http-server >> + PROPERTIES RUNTIME_OUTPUT_DIRECTORY_RELEASE ${CMAKE_BINARY_DIR}/t/helper) >> + endif() >> +endif() >> >> foreach(script ${wrapper_scripts}) >> file(STRINGS ${CMAKE_SOURCE_DIR}/wrap-for-bin.sh content NEWLINE_CONSUME) >> diff --git a/t/helper/.gitignore b/t/helper/.gitignore >> index 8c2ddcce95f..9aa9c752997 100644 >> --- a/t/helper/.gitignore >> +++ b/t/helper/.gitignore >> @@ -1,2 +1,3 @@ >> /test-tool >> /test-fake-ssh >> +/test-http-server >> diff --git a/t/helper/test-http-server.c b/t/helper/test-http-server.c >> new file mode 100644 >> index 00000000000..18f1f741305 >> --- /dev/null >> +++ b/t/helper/test-http-server.c > > A lot of the functions in this file are modified versions of ones in > 'daemon.c'. It would help reviewers/future readers to mention that in the > commit message. I appreciate the thorough effort here in understanding what those daemon.c functions do. Hopefully the next iteration will help other reviewers as I'm going to be extracting the identical functions to share them between daemon.c and test-http-server.c. > My comments are mostly going to be around the similarities/differences from > 'daemon.c', hopefully to understand how 'test-http-server' is meant to be > used. > >> +static void logreport(const char *label, const char *err, va_list params) >> +{ >> + struct strbuf msg = STRBUF_INIT; >> + >> + strbuf_addf(&msg, "[%"PRIuMAX"] %s: ", (uintmax_t)getpid(), label); >> + strbuf_vaddf(&msg, err, params); >> + strbuf_addch(&msg, '\n'); >> + >> + fwrite(msg.buf, sizeof(char), msg.len, stderr); >> + fflush(stderr); >> + >> + strbuf_release(&msg); > > This looks like the 'LOG_DESTINATION_STDERR' case of 'logreport()' in > 'daemon.c', but adds a "label" to represent the priority. Makes sense; these > logs will be helpful to have in stderr when running tests, and the priority > will be captured as well. > >> +} >> + >> +__attribute__((format (printf, 1, 2))) >> +static void logerror(const char *err, ...) >> +{ >> + va_list params; >> + va_start(params, err); >> + logreport("error", err, params); >> + va_end(params); >> +} >> + >> +__attribute__((format (printf, 1, 2))) >> +static void loginfo(const char *err, ...) >> +{ >> + va_list params; >> + if (!verbose) >> + return; >> + va_start(params, err); >> + logreport("info", err, params); >> + va_end(params); >> +} > > These two functions replace the "priority" int with the "label" string, but > otherwise capture the same information. > >> + >> +static void set_keep_alive(int sockfd) > > This function is identical to its 'daemon.c' counterpart; its usage in > 'test-http-server.c' doesn't indicate any need to differ. > >> + >> +/* >> + * The code in this section is used by "worker" instances to service >> + * a single connection from a client. The worker talks to the client >> + * on 0 and 1. >> + */ >> + >> +enum worker_result { >> + /* >> + * Operation successful. >> + * Caller *might* keep the socket open and allow keep-alive. >> + */ >> + WR_OK = 0, >> + >> + /* >> + * Various errors while processing the request and/or the response. >> + * Close the socket and clean up. >> + * Exit child-process with non-zero status. >> + */ >> + WR_IO_ERROR = 1<<0, >> + >> + /* >> + * Close the socket and clean up. Does not imply an error. >> + */ >> + WR_HANGUP = 1<<1, >> + >> + WR_STOP_THE_MUSIC = (WR_IO_ERROR | WR_HANGUP), > > As much as I love the name, I'm not sure having this value defined makes > much sense as its own "state". AFAICT, 'WR_IO_ERROR' means "error AND exit", > but 'WR_HANGUP' just means "exit", so the latter is a superset of the > former. Even if you interpret 'WR_HANGUP' as "*no* error and exit", that > makes it and 'WR_IO_ERROR' mutually exclusive, so the "combined" state > doesn't represent anything "real". Fair point. Will remove this extra value in next iteration. >> +}; >> + >> +static enum worker_result worker(void) >> +{ >> + const char *response = "HTTP/1.1 501 Not Implemented\r\n"; > > Here's the hardcoded 501 error, as mentioned in the commit message. > >> + char *client_addr = getenv("REMOTE_ADDR"); >> + char *client_port = getenv("REMOTE_PORT"); >> + enum worker_result wr = WR_OK; >> + >> + if (client_addr) >> + loginfo("Connection from %s:%s", client_addr, client_port); >> + >> + set_keep_alive(0); >> + >> + while (1) { >> + if (write_in_full(1, response, strlen(response)) < 0) { >> + logerror("unable to write response"); >> + wr = WR_IO_ERROR; >> + } > > This tries to write the response out to stdout (optional nit: you could use > 'STDOUT_FILENO' instead of '1' to make this clearer), and sets 'WR_IO_ERROR' > if it fails. Good point; will use `STDOUT_FILENO` in all applicable places in next iteration. >> + >> + if (wr & WR_STOP_THE_MUSIC) >> + break; > > This will trigger if 'wr' is 'WR_HANGUP' *or* 'WR_IO_ERROR'. Is that > intentional? If it is, I think 'wr != 'WR_OK' might make that more obvious? > >> + } >> + >> + close(0); >> + close(1); >> + >> + return !!(wr & WR_IO_ERROR); > > Then finish by closing out 'stdin' and 'stdout', and returning '0' for "no > error", '1' for "error". > >> +} >> + >> +/* >> + * This section contains the listener and child-process management >> + * code used by the primary instance to accept incoming connections >> + * and dispatch them to async child process "worker" instances. >> + */ >> + >> +static int addrcmp(const struct sockaddr_storage *s1, > > > Identical to 'daemon.c'. > >> +static void add_child(struct child_process *cld, struct sockaddr *addr, socklen_t addrlen) >> +{ >> + struct child *newborn, **cradle; >> + >> + newborn = xcalloc(1, sizeof(*newborn)); >> + live_children++; >> + memcpy(&newborn->cld, cld, sizeof(*cld)); >> + memcpy(&newborn->address, addr, addrlen); >> + for (cradle = &firstborn; *cradle; cradle = &(*cradle)->next) >> + if (!addrcmp(&(*cradle)->address, &newborn->address)) >> + break; >> + newborn->next = *cradle; >> + *cradle = newborn; >> +} > > This is mostly the same as 'daemon.c', but uses 'xcalloc()' instead of > 'CALLOC_ARRAY()'. The latter is an alias for the former, so this is fine. > >> +static void kill_some_child(void) > > ... > >> +static void check_dead_children(void) > Both of these are identical to 'daemon.c'. > >> + >> +static struct strvec cld_argv = STRVEC_INIT; >> +static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen) > > This matches 'daemon.c' except for the addition of: > >> + if (cld.out < 0) >> + logerror("could not dup() `incoming`"); > > The extra context provided by this message could be helpful in debugging. If > nothing else, it doesn't hurt. > >> + else if (start_command(&cld)) >> + logerror("unable to fork"); >> + else >> + add_child(&cld, addr, addrlen); >> +} >> + >> +static void child_handler(int signo) > > ... > >> +static int set_reuse_addr(int sockfd) > > ... > >> +static const char *ip2str(int family, struct sockaddr *sin, socklen_t len) > > ... > >> +#ifndef NO_IPV6 >> + >> +static int setup_named_sock(char *listen_addr, int listen_port, struct socketlist *socklist) > ... > >> +#else /* NO_IPV6 */ >> + >> +static int setup_named_sock(char *listen_addr, int listen_port, struct socketlist *socklist) > > All of these functions match 'daemon.c' (save for some whitespace fixups). > >> + >> +static void socksetup(struct string_list *listen_addr, int listen_port, struct socketlist *socklist) >> +{ >> + if (!listen_addr->nr) >> + setup_named_sock("127.0.0.1", listen_port, socklist); > > This is the only difference in this function from 'daemon.c' (there, the > first arg is 'NULL', which ends up mapping to 'INADDR_ANY'). Why the change > in default? Next iteration will share implementation with daemon.c. >> + else { >> + int i, socknum; >> + for (i = 0; i < listen_addr->nr; i++) { >> + socknum = setup_named_sock(listen_addr->items[i].string, >> + listen_port, socklist); >> + >> + if (socknum == 0) >> + logerror("unable to allocate any listen sockets for host %s on port %u", >> + listen_addr->items[i].string, listen_port); >> + } >> + } >> +} >> + >> +static int service_loop(struct socketlist *socklist) > > This function differs from 'daemon.c' by using removal of the 'pid_file' to > force a graceful shutdown of the server. > >> +{ >> + struct pollfd *pfd; >> + int i; >> + >> + CALLOC_ARRAY(pfd, socklist->nr); >> + >> + for (i = 0; i < socklist->nr; i++) { >> + pfd[i].fd = socklist->list[i]; >> + pfd[i].events = POLLIN; >> + } >> + >> + signal(SIGCHLD, child_handler); >> + >> + for (;;) { >> + int i; >> + int nr_ready; >> + int timeout = (pid_file ? 100 : -1); >> + >> + check_dead_children(); >> + >> + nr_ready = poll(pfd, socklist->nr, timeout); > > Setting a timeout here (if 'pid_file' is present) allows us to operate in a > mode where the removal of a 'pid_file' indicates that the server should shut > down. > >> + if (nr_ready < 0) { > > 'nr_ready < 0' indicates an error [1]; handle the same way as 'daemon.c'. > > [1] https://man7.org/linux/man-pages/man2/poll.2.html > >> + if (errno != EINTR) { >> + logerror("Poll failed, resuming: %s", >> + strerror(errno)); >> + sleep(1); >> + } >> + continue; >> + } >> + else if (nr_ready == 0) { > > 'nr_ready == 0' indicates a polling timeout (see [1] above)... > >> + /* >> + * If we have a pid_file, then we watch it. >> + * If someone deletes it, we shutdown the service. >> + * The shell scripts in the test suite will use this. >> + */ >> + if (!pid_file || file_exists(pid_file)) >> + continue; >> + goto shutdown; > > ...and that timeout exists so that we can check whether the 'pid_file' still > exists and, if so, shut down gracefully. > >> + } >> + > > Otherwise, 'nr_ready > 1', so handle the polled events. > >> + for (i = 0; i < socklist->nr; i++) { >> + if (pfd[i].revents & POLLIN) { >> + union { >> + struct sockaddr sa; >> + struct sockaddr_in sai; >> +#ifndef NO_IPV6 >> + struct sockaddr_in6 sai6; >> +#endif >> + } ss; >> + socklen_t sslen = sizeof(ss); >> + int incoming = accept(pfd[i].fd, &ss.sa, &sslen); >> + if (incoming < 0) { >> + switch (errno) { >> + case EAGAIN: >> + case EINTR: >> + case ECONNABORTED: >> + continue; >> + default: >> + die_errno("accept returned"); >> + } >> + } >> + handle(incoming, &ss.sa, sslen); >> + } >> + } >> + } >> + >> +shutdown: >> + loginfo("Starting graceful shutdown (pid-file gone)"); >> + for (i = 0; i < socklist->nr; i++) >> + close(socklist->list[i]); >> + >> + return 0; > > This addition logs the shutdown and closes out sockets. Looks good! > >> +} >> + >> +static int serve(struct string_list *listen_addr, int listen_port) >> +{ >> + struct socketlist socklist = { NULL, 0, 0 }; >> + >> + socksetup(listen_addr, listen_port, &socklist); >> + if (socklist.nr == 0) >> + die("unable to allocate any listen sockets on port %u", >> + listen_port); >> + >> + loginfo("Ready to rumble"); > > I thought this was a leftover debug printout, but it turns out that > 'serve()' in 'daemon.c' has the same message. :) Indeed! This made me chuckle when I first saw it.. >> + >> + /* >> + * Wait to create the pid-file until we've setup the sockets >> + * and are open for business. >> + */ >> + if (pid_file) >> + write_file(pid_file, "%"PRIuMAX, (uintmax_t) getpid()); >> + >> + return service_loop(&socklist); >> +} >> + >> +/* >> + * This section is executed by both the primary instance and all >> + * worker instances. So, yes, each child-process re-parses the >> + * command line argument and re-discovers how it should behave. >> + */ >> + >> +int cmd_main(int argc, const char **argv) >> +{ >> + int listen_port = 0; >> + struct string_list listen_addr = STRING_LIST_INIT_NODUP; >> + int worker_mode = 0; >> + int i; >> + >> + trace2_cmd_name("test-http-server"); >> + setup_git_directory_gently(NULL); > > Since this isn't part of 'test-tool', it needs to do its own trace2 setup, > but it seems to be missing some of the relevant function calls. Could you > include 'trace2_cmd_list_config()' and 'trace2_cmd_list_env_vars()' as well? Sure! >> + >> + for (i = 1; i < argc; i++) { > > Can this loop be replaced with 'parse_options()' and the appropriate 'struct > option[]'? Newer test helpers ('test-bundle-uri', 'test-cache-tree', > 'test-getcwd') have been using it, and it generally seems much easier to > work with/more flexible than a custom 'if()' block (handling option > negation, interpreting both '--option=<value>' and '--option value' syntax > etc.). > > That said, it looks this was mostly pulled from 'daemon.c' (which might > predate 'parse_options()'), so I'd also understand if you want to keep it as > similar to that as possible. Up to you! For now I think I'll keep it the same style as daemon.c. >> + /* avoid splitting a message in the middle */ >> + setvbuf(stderr, NULL, _IOFBF, 4096); >> + >> + if (listen_port == 0) >> + listen_port = DEFAULT_GIT_PORT; >> + >> + /* >> + * If no --listen=<addr> args are given, the setup_named_sock() >> + * code will use receive a NULL address and set INADDR_ANY. >> + * This exposes both internal and external interfaces on the >> + * port. >> + * >> + * Disallow that and default to the internal-use-only loopback >> + * address. >> + */ >> + if (!listen_addr.nr) >> + string_list_append(&listen_addr, "127.0.0.1"); >> + >> + /* >> + * worker_mode is set in our own child process instances >> + * (that are bound to a connected socket from a client). >> + */ >> + if (worker_mode) >> + return worker(); >> + >> + /* >> + * `cld_argv` is a bit of a clever hack. The top-level instance >> + * of test-http-server does the normal bind/listen/accept stuff. >> + * For each incoming socket, the top-level process spawns >> + * a child instance of test-http-server *WITH* the additional >> + * `--worker` argument. This causes the child to set `worker_mode` >> + * and immediately call `worker()` using the connected socket (and >> + * without the usual need for fork() or threads). >> + * >> + * The magic here is made possible because `cld_argv` is static >> + * and handle() (called by service_loop()) knows about it. >> + */ >> + strvec_push(&cld_argv, argv[0]); >> + strvec_push(&cld_argv, "--worker"); >> + for (i = 1; i < argc; ++i) >> + strvec_push(&cld_argv, argv[i]); >> + >> + /* >> + * Setup primary instance to listen for connections. >> + */ >> + return serve(&listen_addr, listen_port); > > The rest of the function is "new", but is well-documented and appears to > work as intended. > >> +} > > One last note/suggestion - while a lot of the functions in > 'test-http-server.c' are modified from those in 'daemon.c', there are a fair > number of identical functions as well. Would it be possible to libify some > of 'daemon.c's functions (mainly by creating a 'daemon.h' and making the > functions non-static) so that they don't need to be copied? > Watch for my next iteration for this! Thanks, Matthew