On Fri, Dec 13, 2019 at 01:45:47PM -0800, Junio C Hamano wrote: > Emily Shaffer <emilyshaffer@xxxxxxxxxx> writes: > > > Teach bugreport to gather the values of config options which are present > > in 'bugreport-config-safelist.h'. > > > > Many config options are sensitive, and many Git add-ons use config > > options which git-core does not know about; it is better only to gather > > config options which we know to be safe, rather than excluding options > > which we know to be unsafe. > > > > Taking the build-time generated array and putting it into a set saves us > > time - since git_config_bugreport() is called for every option the user > > has configured, performing option lookup in constant time is a useful > > optimization. > > Interesting. I actually was expecting the look-up to go the other > way around. We know the safe keys, so iterate over them and grab > their values, if defined. No need for hashes or anything, but just > a simple linear list of safe stuff. Hm. Without looking at the code, I said to myself, "repo_config_get_value() will open all the available config files to find the resolved value, so I don't want to do n*4 file open/closes, I only want to do 4 total." Now I look at the code and see that the configs are already being read into a hashset before now. So you're right that it doesn't make sense for me to do it this way.... > > But that is too simple-minded. If we wanted to safelist foo.*.bar, > where '*' can be anything, walking on the list of safe variables > would not work. We must have a hash table that knows "foo.*.bar" is > allowed, and while walking all the configuration keys, when we see > foo.a.bar, we consult "foo.*.bar" as well as "foo.a.bar" to see if > it is whitelisted, or something like that. ...unless we want to use wildcards like you suggest. But I'm not sure it's a good idea. I envision someone writing another Git add-on, which offers someone to specify "user.password" and automagically do some Bitbucket interaction, or something. (An extreme and misguided example, but I hope the point remains.) Then if I say, "all the user.* configs I can see in 'git help config' are safe, so I will just safelist user.*," I'm not accounting for this other tool's known configs which are a superset of what Git knows and expects. The point of this safelist-generation exercise is to avoid accidentally printing some sensitive user config, and the point of using a safelist instead of a blocklist is to avoid printing a third-party config we didn't know about (or messing up our pattern matching). So I suggest we avoid pattern matching entirely. > > But then I am not sure if this implementation does something like > this for three-level names. If not, I do not see much point in use > of the hash there either. > > Puzzled. I'd prefer to solve it by iterating over the compile-time-generated safelist using get_config_value_multi(), which I clock at O(n) for n = safelist length: performing n calls to an O(1) hashset lookup. That saves us the O(n*log(n)) hashset population, and the implementation of Yet Another Set in the source. Thanks. - Emily