On 3/24/23 15:52, Eric Blake wrote: > [attempting to loop in systemd folks; this started in libnbd at > https://listman.redhat.com/archives/libguestfs/2023-March/031178.html > - although I may have to retry since I'm not a usual subscriber of > systemd-devel] > > On Fri, Mar 24, 2023 at 11:32:26AM +0100, Laszlo Ersek wrote: >>>> @@ -245,6 +245,9 @@ CONNECT_SA.START: >>>> "LISTEN_PID=", "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", &pid_ofs); >>>> SACT_VAR_PUSH (sact_var, &num_vars, >>>> "LISTEN_FDS=", "1", NULL); >>>> + if (h->sact_name != NULL) >>>> + SACT_VAR_PUSH (sact_var, &num_vars, >>>> + "LISTEN_FDNAMES=", h->sact_name, NULL); >>>> if (prepare_socket_activation_environment (&env, sact_var, num_vars) == -1) >>> >>> If I'm reading this correctly, this does wipe an inherited >>> LISTEN_FDNAMES from the environment in the case where the application >>> linked with libnbd started life with a (different) socket activation, >>> but now the user wants to connect to an nbd server without setting a >>> name (default usage, or explicitly requested a name of ""). >> >> Good observation; this is a functionality gap that goes back to v1 of >> this patch. (I've not investigated the specifics of systemd socket >> activation before; but see below.) >> >>> Put another way, SACT_VAR_PUSH as written appears to be only additive >>> for replacement purposes (if I pushed a variable, I intend to override >>> it in the child process, so don't copy it from environ if one was >>> previously there), but not effective for deletion purposes (I don't >>> intend to set the variable, but if it is already set in environ, I >>> want it omitted in the child's copy). >>> >>> Is there a way to rework this so that you can pass NULL as the fourth >>> parameter as an indication of an unset request (vs. "" when you want >>> it set to the empty string)? At which point, you would drop the 'if >>> (h->sact_name != NULL)', and just blindly use SACT_VAR_PUSH(, >>> h->sact_name,). That has ripple effects earlier in the series to >>> support those semantics. >> >> For understanding your point, I have had to read up on systemd socket >> activation: >> >> https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html >> >> Let me quote a part: >> >>> Under specific conditions, the following automatic file descriptor names are returned: >>> >>> [...] >>> "unknown" -- The process received no name for the specific file >>> descriptor from the service manager. >>> [...] >> >> I've also checked the implementation of sd_listen_fds_with_names(): >> >> https://github.com/systemd/systemd/blob/main/src/libsystemd/sd-daemon/sd-daemon.c >> > ... >> >> (2) If we pass LISTEN_PID=xxx and LISTEN_FDS=1, and just "pass through" >> an *inherited* LISTEN_FDNAMES variable, then it will (in the general >> case) confuse the nbd server that we start. Namely, if >> LISTEN_FDNAMES has multiple colon-separated elements (more than 1), >> or is the empty string (= 0 elements), then >> sd_listen_fds_with_names() in the nbd server will fail the "n_names >> != n_fds" check, and return (-EINVAL). If LISTEN_FDNAMES happens to >> have one element, then sd_listen_fds_with_names() will succeed, but >> the returned name will confuse the nbd server. > > ... > > Eww - this may be a bigger systematic issue caused by systemd itself, > and we should report it there (done by adding in cc here). They did > not specify LISTEN_FDNAMES at the time LISTEN_PID was first > documented, so the likelihood of libnbd not being the only application > that happens to leak inherited LISTEN_FDNAMES through to the child > process is non-zero, where this sort of bug will bite more than one > client of systemd socket activation. And it is this sort of > backwards-incompatibility caused by the systemd extension that they > will need to be more careful of avoiding if they ever add any future > LISTEN_* environment variables. I wonder if this is what the "unset_environment" parameter of sd_listen_fds() and sd_listen_fds_with_names() exists for. If any socket-activated application (such as a libnbd client application) is *required* to call one of these functions at least once with a nonzero "unset_environment" argument (reasonably early after startup), then that eliminates the problem. For example (specifically), by the time we got to prepare_socket_activation_environment(), we'd find (or leak!) no inherited LISTEN_* variable (so the "conflict handling" logic could be removed altogether from prepare_socket_activation_environment()). The above-linked spec <https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html> writes, "If the unset_environment parameter is non-zero [...] the variables are no longer inherited by child processes." So, if a libnbd client application (a) is socket activated, and (b) plans on calling nbd_connect_systemd_socket_activation() itself, then this application *MAY* already be responsible for passing a nonzero "unset_environment" argument, when it calls sd_listen_fds() or sd_listen_fds_with_names() early on. I'm emphasizing "may", because the spec does not explain what use case "unset_environment" is specifically meant for. Laszlo