On Fri, Jan 11 2019, Jeff King wrote: > Similar to the recent: > > https://public-inbox.org/git/20190109221007.21624-1-kgybels@xxxxxxxxxxxx/ > > there are some other places where we do not follow the POSIX rule that > getenv()'s return value may be invalidated by other calls to getenv() or > setenv(). > > For the most part we haven't noticed because: > > - on many platforms, you can call getenv() as many times as you want. > This changed recently in our mingw_getenv() helper, which is why > people are noticing now. > > - calling setenv() in between _often_ works, but it depends on whether > libc feels like it needs to reallocate memory. Which is itself > platform specific, and even on a single platform may depend on > things like how many environment variables you have set. > > The first patch here is a problem somebody actually found in the wild. > That led me to start looking through the results of: > > git grep '= getenv(' > > There are a ton of hits. I poked at the first 20 or so. A lot of them > are fine, as they do something like this: > > rla = getenv("GIT_REFLOG_ACTION"); > strbuf_addstr("blah blah %s", rla); > > That's not _strictly_ correct, because strbuf_addstr() may actually look > at the environment. But it works for our mingw_getenv() case, because > there we use a rotating series of buffers. So as long as it doesn't look at > 30 environment variables, we're fine. And many calls fall into that > bucket (a more complicated one is get_ssh_command(), which runs a fair > bit of code while holding the pointer, but ultimately probably has a > small fixed number of opportunities to call getenv(). What is more > worrisome is code that holds a pointer across an arbitrary number of > calls (like once per diff'd file, or once per submodule, etc). > > Of course it's possible for some platform libc to use a single buffer. > But in that case, I'd argue that the path of least resistance is > wrapping getenv, like I described in: > > https://public-inbox.org/git/20181025062037.GC11460@xxxxxxxxxxxxxxxxxxxxx/ > > So anyway. Here are a handful of what seem like pretty low-hanging > fruit. Beyond the first one, I'm not sure if they're triggerable, but > they're easy to fix. There are 100+ grep matches that I _didn't_ audit, > so this is by no means a complete fix. I was mostly trying to get a > sense of how painful these fixes would be. I wonder, and not as "you should do this" feedback on this series, just on future development, whether we shouldn't just make our own getenv() wrapper for the majority of the GIT_* variables. The semantics would be fetch value X, and if it's ever requested again return the value we found the first time. For some things we rely on getenv(X) -> setenv(X) -> getenv(X) returning different values of X, e.g. in passing along config, but for e.g. GIT_TEST_* variables we just want to check them once, and have our own ad-hoc caches (via static variables) in a couple of places. Maybe such an API would just loop over "environ" on startup, looking for any GIT_* variables, i.e. called from the setup.c code.