Re: [PATCH v4 10/15] bugreport: add config values from safelist

Emily Shaffer <emilyshaffer@xxxxxxxxxx> · Mon, 16 Dec 2019 15:40:36 -0800

On Fri, Dec 13, 2019 at 01:45:47PM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@xxxxxxxxxx> writes:
> 
> > Teach bugreport to gather the values of config options which are present
> > in 'bugreport-config-safelist.h'.
> >
> > Many config options are sensitive, and many Git add-ons use config
> > options which git-core does not know about; it is better only to gather
> > config options which we know to be safe, rather than excluding options
> > which we know to be unsafe.
> >
> > Taking the build-time generated array and putting it into a set saves us
> > time - since git_config_bugreport() is called for every option the user
> > has configured, performing option lookup in constant time is a useful
> > optimization.
> 
> Interesting.  I actually was expecting the look-up to go the other
> way around.  We know the safe keys, so iterate over them and grab
> their values, if defined.  No need for hashes or anything, but just
> a simple linear list of safe stuff.

Hm. Without looking at the code, I said to myself,
"repo_config_get_value() will open all the available config files to
find the resolved value, so I don't want to do n*4 file open/closes, I
only want to do 4 total."

Now I look at the code and see that the configs are already being read
into a hashset before now. So you're right that it doesn't make sense
for me to do it this way....

> 
> But that is too simple-minded.  If we wanted to safelist foo.*.bar,
> where '*' can be anything, walking on the list of safe variables
> would not work.  We must have a hash table that knows "foo.*.bar" is
> allowed, and while walking all the configuration keys, when we see
> foo.a.bar, we consult "foo.*.bar" as well as "foo.a.bar" to see if
> it is whitelisted, or something like that.

...unless we want to use wildcards like you suggest.

But I'm not sure it's a good idea. I envision someone writing another
Git add-on, which offers someone to specify "user.password" and
automagically do some Bitbucket interaction, or something. (An extreme
and misguided example, but I hope the point remains.) Then if I say,
"all the user.* configs I can see in 'git help config' are safe, so I
will just safelist user.*," I'm not accounting for this other tool's
known configs which are a superset of what Git knows and expects.

The point of this safelist-generation exercise is to avoid accidentally
printing some sensitive user config, and the point of using a safelist
instead of a blocklist is to avoid printing a third-party config we
didn't know about (or messing up our pattern matching). So I suggest we
avoid pattern matching entirely.

> 
> But then I am not sure if this implementation does something like
> this for three-level names.  If not, I do not see much point in use
> of the hash there either.
> 
> Puzzled.

I'd prefer to solve it by iterating over the compile-time-generated
safelist using get_config_value_multi(), which I clock at O(n) for n =
safelist length: performing n calls to an O(1) hashset lookup. That
saves us the O(n*log(n)) hashset population, and the implementation of
Yet Another Set in the source.

Thanks.

 - Emily