Re: [GSOC 2014]idea:Git Configuration API Improvement

Matthieu Moy <Matthieu.Moy@xxxxxxxxxxxxxxx> · Thu, 20 Mar 2014 10:10:23 +0100

Hi,

Yao Zhao <zhaox383@xxxxxxx> writes:

> First is about when to start reading configuration file to cache. My
> idea is the time user starts call command that need configuration
> information (need to read configuration file).

I'd actually load the configuration lazily, when Git first requires a
configuration variable's value. Something like

int config_has_been_loaded = 0;

git_config() {
	if (!config_has_been_loaded) {
		load_config();
		config_has_been_loaded = 1;
	} else if (cache_is_outdated()) {
		load_config();
	} else { /* Nothing to do, we're good */ }
	do_something_with_loaded_config();
}

> Second is about data structure. I read Peff's email listed on idea
> page. He indicated two methods and I prefer syntax tree.

Why?

(In general, explaining why you chose something is more important than
explaining what you chose)

> I think there should be three or more syntax tree in the cache. One
> for system, one for global and one for local. If user indicate a file
> to be configuration file, add one more tree. Or maybe we can build one
> tree and tag every node to indicate where it belongs to.

A tree (AST, Abstract syntax tree) can be interesting if you have some
source-to-source transformations to do on the configuration files (i.e.
edit the config files themselves).

For read-only accesses, I would find it more natural to have a
data-structure that reflects the configuration variables themselves, not
the way they appear in the config file. For example, a map (hashtable)
associating to each config variable the corresponding value (which may
be a scalar value or a list, depending on the variable).

But the really important part here is the API exposed to the user, not
the internal data-structure. A map would be "more efficient" (O(1) or
O(log(n)) access), but traversing the AST for each config request would
not really harm: this is currently what we're doing, except that we
currently re-parse the file each time. OTOH, the API should hide the AST
for most uses. If the user wants the value of configuration variable
"foo", the code to do that should not be much more complex than
get_value_for_config_variable("foo"). (well, I did oversimplify a bit
here).

> Third one is about when to write back to file, I am really confused
> about it. I think one way could be when user leave git repository
> using "cd" to go back. But I am not sure if git could detect user
> calls "cd" to leave repository.

There semes to be a misunderstanding here. The point of the project is
to have a per-process cache, but Git does not normally store a state in
memory between two calls. IOW, when you run

  git status
  cd ../
  git log

The call to "git status" creates a process, but the process dies before
you run "cd". The call to "git log" is a different process. It can
re-use things that "git status" left on disk, but not in-memory data
structures.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html