Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 27.06.2014 21:19, schrieb Matthieu Moy:
> 
> The reason for which setting any config value invalidates the cache is
> that it is _much_ simpler to implement. Less complexity, less
> corner-cases, less bugs.
> 

I think I see what you mean. E.g. in cmd_clone we have:

  write_config(&option_config);
  git_config(git_default_config, NULL);
  ...
  git_config_set("remote.Foo.url", repo);
  ...
  remote = remote_get(option_origin); /* which does 'git_config(handle_config)' */

So if we load the config cache at 'git_config(git_default_config)', then
'remote_get' won't see "remote.Foo.url" unless we invalidate the cache first.


I still don't like that the invalidation is done in git_config_set, though, as
this is also used to write completely unrelated files. Wouldn't it be better to
have a 'git_config_refresh()' that could be used in place of (or before) current
'git_config(callback)' calls? The initial implementation could just invalidate
the config cache. If there's time and energy to spare, a more advanced version
could first check if any of the involved config files has changed.


The xstrdup() problem could be solved by interning strings (see the
attached patch for a trivial implementation). I.e. allocate each distinct
string only once (and keep it allocated).

So if there are 100 instances of "true" in the config file, the interned string
pool would contain only one instance (i.e. 5 bytes + hashmap_entry + malloc
overhead, vs. 100 * (5 bytes + malloc overhead) in case of xstrdup()). If the
config cache is cleared, the interned string in the pool would still remain
valid. If the config cache is reloaded, unmodified values would reuse the
existing strings from the pool.

Side note: interning getenv() results would fix the non-POSIX-compliant
getenv()-usage in git [1].

[1] https://groups.google.com/d/msg/msysgit/4cVWWJkN4to/DR8FGTQ09Q0J

---- 8< ----
Subject: [PATCH] hashmap: add string interning API

Interning short strings that are likely to have multiple copies may reduce
memory footprint and speed up string comparisons.

Add a strintern() API that uses a hashmap to manage the pool of interned
strings.

Signed-off-by: Karsten Blees <blees@xxxxxxx>
---
 Documentation/technical/api-hashmap.txt | 11 +++++++++++
 hashmap.c                               | 34 +++++++++++++++++++++++++++++++++
 hashmap.h                               |  4 ++++
 t/t0011-hashmap.sh                      | 13 +++++++++++++
 test-hashmap.c                          | 14 ++++++++++++++
 5 files changed, 76 insertions(+)

diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
index b977ae8..db7c955 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -162,6 +162,17 @@ more entries.
 `hashmap_iter_first` is a combination of both (i.e. initializes the iterator
 and returns the first entry, if any).
 
+`const char *strintern(const char *string)`::
+
+	Returns the unique, interned version of the specified string, similar
+	to the `String.intern` API in Java and .NET. Interned strings remain
+	valid for the entire lifetime of the process.
++
+Can be used as `[x]strdup()` replacement, except that the interned string must
+not be modified or freed.
++
+Uses a hashmap to store the pool of interned strings.
+
 Usage example
 -------------
 
diff --git a/hashmap.c b/hashmap.c
index d1b8056..03cd8f3 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -226,3 +226,37 @@ void *hashmap_iter_next(struct hashmap_iter *iter)
 		current = iter->map->table[iter->tablepos++];
 	}
 }
+
+struct string_pool_entry {
+	struct hashmap_entry ent;
+	char key[FLEX_ARRAY];
+};
+
+static int string_pool_cmp(const struct string_pool_entry *e1,
+			   const struct string_pool_entry *e2,
+			   const char *key)
+{
+	return strcmp(e1->key, key ? key : e2->key);
+}
+
+const char *strintern(const char *string)
+{
+	static struct hashmap map;
+	struct string_pool_entry key, *e;
+	/* initialize string pool hashmap */
+	if (!map.tablesize)
+		hashmap_init(&map, (hashmap_cmp_fn) string_pool_cmp, 0);
+
+	/* lookup interned string in pool */
+	hashmap_entry_init(&key, strhash(string));
+	e = hashmap_get(&map, &key, string);
+	if (!e) {
+		/* not found: create it */
+		int len = strlen(string);
+		e = xmalloc(sizeof(struct string_pool_entry) + len + 1);
+		hashmap_entry_init(e, key.ent.hash);
+		memcpy(e->key, string, len + 1);
+		hashmap_add(&map, e);
+	}
+	return e->key;
+}
diff --git a/hashmap.h b/hashmap.h
index a816ad4..b428677 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -68,4 +68,8 @@ static inline void *hashmap_iter_first(struct hashmap *map,
 	return hashmap_iter_next(iter);
 }
 
+/* string interning */
+
+extern const char *strintern(const char *string);
+
 #endif
diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
index 391e2b6..f97c805 100755
--- a/t/t0011-hashmap.sh
+++ b/t/t0011-hashmap.sh
@@ -237,4 +237,17 @@ test_expect_success 'grow / shrink' '
 
 '
 
+test_expect_success 'string interning' '
+
+test_hashmap "intern value1
+intern Value1
+intern value2
+intern value2
+" "value1
+Value1
+value2
+value2"
+
+'
+
 test_done
diff --git a/test-hashmap.c b/test-hashmap.c
index f5183fb..116cbb5 100644
--- a/test-hashmap.c
+++ b/test-hashmap.c
@@ -239,6 +239,20 @@ int main(int argc, char *argv[])
 			/* print table sizes */
 			printf("%u %u\n", map.tablesize, map.size);
 
+		} else if (!strcmp("intern", cmd) && l1) {
+
+			/* test that strintern works */
+			const char *i1 = strintern(p1);
+			const char *i2 = strintern(p1);
+			if (strcmp(i1, p1))
+				printf("strintern(%s) returns %s\n", p1, i1);
+			else if (i1 == p1)
+				printf("strintern(%s) returns input pointer\n", p1);
+			else if (i1 != i2)
+				printf("strintern(%s) != strintern(%s)", i1, i2);
+			else
+				printf("%s\n", i1);
+
 		} else if (!strcmp("perfhashmap", cmd) && l1 && l2) {
 
 			perf_hashmap(atoi(p1), atoi(p2));
-- 
2.0.0.9649.g1eafa1f.dirty


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]