Re: [PATCH v4 1/2] add `config_set` API for caching config files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tanay Abhra <tanayabh@xxxxxxxxx> writes:

> Add a `config_set` construct that points to an ordered set of config files
> specified by the user, each of which represents what was read and cached
> in core as hashmaps. Add two external functions `git_configset_get_value`
> and `git_configset_get_value_multi` for querying from the config sets,
> which follow `last one wins` semantic(i.e. if there are multiple matches

Space before (

> for the queried key in the files of the configset the value returned will
> the last value in the last matching file of the configset) Add type

will _be_ ?

Does this remark also apply to git_configset_get_value_multi? My
understanding was that "last one wins" applied only to
git_configset_get_value.

> Add a default `config_set`, `the_config_set` to cache all key-value pairs

I don't like the name "the_config_set". It's not the only one. Perhaps
repo_config_set? (not totally satisfactory as it does not contain only
the repo, but the repo+user+system)

What do others think?

> read from usual config files(repo specific .git/config, user wide

(You always forget space before '('...)

> ~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
> single hashmap populated using git_config(). The reason for doing so is
> twofold, one is to honour include directives, another is to guarantee O(1)
> lookups for usual config values, as values are queried for hundred of
> times during a run of a git program.

What is the reason to deal with `the_config_set` and other config sets
differently? You're giving arguments to store `the_config_set` as a
single hashmap, but what is the reason to store others as multiple
hashmaps?

And actually, I don't completely buy your arguments: having 3 or 4
hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
/etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
could deal with include directives by having ".git/config and included
files" in a hashmap, "~/.gitconfig and included files" in a second
hashmap, ...

My guess is that the real argument is "git_config already does what I
want and I'm too lazy to change it". And I do consider lazyness as a
quality for a computer-scientist ;-).

I would personally find it much simpler to have a single hashmap. We'd
lose the ability to invalidate the cache for only a single file, but I'm
not sure it's worth the code complexity.

> +Querying For Specific Variables
> +-------------------------------
> +
> +For programs wanting to query for specific variables in a non-callback
> +manner, the config API provides two functions `git_config_get_value`
> +and `git_config_get_value_multi`.They both read values from an internal

Space after .

> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
> +`git_config` callback function signature is used for iterating.

"Existing" refers to the old state. It's OK in a commit message, but
won't make sense in the future, when your non-callback API and the old
callback API will live side by side.

> +The config API also provides type specific API functions which do conversion
> +as well as retrieval for the queried variable, including:
> +
> +`git_config_get_int`::
> +Parse the retrieved value to an integer, including unit factors. Dies on
> +error; otherwise, allocates and copies the integer into the dest parameter.
> +Returns 0 on success, or 1 if no value was found.

It seems strange to me to return 1 here, and -1 in git_config_get_value
to mean the same thing.

> +`git_config_bool`::

Isn't it git_config_get_bool?

> +/* config-hash.c */

You may point to the documentation in the comment too.

> +#define CONFIG_SET_INIT { NULL, 0, 0 }
> +
> +struct config_set {
> +	struct config_set_item *item;
> +	unsigned int nr, alloc;
> +};
> +
> +struct config_set_item {
> +	struct hashmap config_hash;
> +	char *filename;

Perhaps const char *?

> +static struct hashmap *add_config_hash(const char *filename, struct config_set *cs)
> +{
> +	int ret = 0;
> +	struct config_set_item *ct;
> +	struct config_set_cb cb;
> +	ALLOC_GROW(cs->item, cs->nr + 1, cs->alloc);
> +	ct = &cs->item[cs->nr++];
> +	ct->filename = xstrdup(filename);
> +	config_hash_init(&ct->config_hash);
> +	cb.cs = cs;
> +	cb.ct = ct;
> +	/*
> +	 * When filename is "default", hashmap is constructed from the usual set of
> +	 * config files (i.e repo specific .git/config, user wide ~/.gitconfig and the
> +	 * global /etc/gitconfig), used in `the_config_set`
> +	 */
> +	if (!strcmp(filename, "default"))

How does one read a file actually called "default" with this API?

More generally, why do you need a special-case here? I think it would be
much better to leave this part unaware of the default, and have a
separate function like create_repo_config_hash (or
create_the_config_hash) that would call git_config(...). There actually
isn't much shared code between the "default" case and the others in your
function.

> +static struct hashmap *get_config_hash(const char *filename, struct config_set *cs)
> +{
> +	int i;
> +	for(i = 0; i < cs->nr; i++) {
> +		if (!strcmp(cs->item[i].filename, filename))
> +			return &cs->item[i].config_hash;
> +	}
> +	return add_config_hash(filename, cs);
> +}
> +
> +static struct config_hash_entry *config_hash_find_entry(const char *key, const char* filename,
> +							struct config_set *cs)

I don't get the logic here.

Either the caller explicitly manages a config_set variable like

  config_set my_set = git_configset_init();
  git_configset_add_file(my_set, "my-file");
  git_configset_get_value(my_set, "some.variable.name");

Or there's an implicit singleton mapping files to hashmaps to allow the
user to say

  git_configset_get_value("my-file", "some.variable.name");

without asking the user to explicitly manipulate config_set variables.

It seems to me that your solution is a bad compromise between both
solutions: you do require the user to manipulate config_set variables,
but you also require a filename to look for the right entry in the list.

Did you miss the end of Junio's message:

  http://thread.gmane.org/gmane.comp.version-control.git/252567

?

If the user explicitly asks for a single entry in the list, then why
group them in a list? I guess the question is the same as above, and the
more I read the patch, the more I think a config_set should contain a
single hashmap.

> +static int config_hash_add_value(const char *key, const char *value,
> +				 struct config_set_item *ct, struct config_set *cs)
> +{
> +	struct hashmap *config_hash = &ct->config_hash;
> +	struct config_hash_entry *e;
> +	e = config_hash_find_entry(key, ct->filename, cs);
> +
> +	if (!e) {
> +		e = xmalloc(sizeof(*e));
> +		hashmap_entry_init(e, strhash(key));
> +		e->key = xstrdup(key);
> +		memset(&e->value_list, 0, sizeof(e->value_list));
> +		e->value_list.strdup_strings = 1;
> +		hashmap_add(config_hash, e);
> +	}
> +	/*
> +	 * Since the values are being fed by git_config*() callback mechanism, they
> +	 * are already normalised.

We usually speak en_US rather than en_UK => normalized.

> +int git_config_get_value(const char *key, const char **value)
> +{
> +	if (!the_config_set_initialised) {
> +		git_configset_add_file(&the_config_set, "default");
> +		the_config_set_initialised = 1;
> +	}
> +	return git_configset_get_value(&the_config_set, key, value);
> +}
> +
> +const struct string_list *git_config_get_value_multi(const char *key)
> +{
> +	if (!the_config_set_initialised) {
> +		git_configset_add_file(&the_config_set, "default");
> +		the_config_set_initialised = 1;
> +	}
> +	return git_configset_get_value_multi(&the_config_set, key);
> +}

These if statements could be refactored, and the_config_set_initialised
could be hidden to the caller. Just make a function get_the_config_set()
that initializes if needed and returns a pointer to the_config_set.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]