Re: [PATCH 0/6] getenv() timing fixes

Jeff King <peff@xxxxxxxx> · Tue, 15 Jan 2019 14:12:35 -0500

On Sat, Jan 12, 2019 at 12:31:21PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > So anyway. Here are a handful of what seem like pretty low-hanging
> > fruit. Beyond the first one, I'm not sure if they're triggerable, but
> > they're easy to fix. There are 100+ grep matches that I _didn't_ audit,
> > so this is by no means a complete fix. I was mostly trying to get a
> > sense of how painful these fixes would be.
> 
> I wonder, and not as "you should do this" feedback on this series, just
> on future development, whether we shouldn't just make our own getenv()
> wrapper for the majority of the GIT_* variables. The semantics would be
> fetch value X, and if it's ever requested again return the value we
> found the first time.

Yeah, that thought certainly crossed my mind while looking into this.
But as you noted below, you do sometimes have to worry about
invalidating that cache. The most general solution is that you'd hook
setenv(), too. At which point you've basically just constructed a shadow
environment that has less-crappy semantics than what POSIX guarantees. ;)

Another option is to just have a getenv_safe() that always returns an
allocated string. That's less efficient in some cases, but probably not
meaningfully so (you probably shouldn't be calling getenv() in a tight
loop anyway). It does mean dealing with memory ownership, though, which
is awkward in some cases (e.g., see git_editor).

Mostly I'm worried about making a system that's opaque or easy for
people to get wrong (e.g., if our getenv() wrapper quietly caches things
but setenv() does not invalidate that cache, that's a recipe for
confusion).

> Maybe such an API would just loop over "environ" on startup, looking for
> any GIT_* variables, i.e. called from the setup.c code.

I think whatever we do could just lazy-load. There's no particular
initialization we have to do at the beginning of the program.

-Peff