Am 27.06.2014 21:19, schrieb Matthieu Moy: > > The reason for which setting any config value invalidates the cache is > that it is _much_ simpler to implement. Less complexity, less > corner-cases, less bugs. > I think I see what you mean. E.g. in cmd_clone we have: write_config(&option_config); git_config(git_default_config, NULL); ... git_config_set("remote.Foo.url", repo); ... remote = remote_get(option_origin); /* which does 'git_config(handle_config)' */ So if we load the config cache at 'git_config(git_default_config)', then 'remote_get' won't see "remote.Foo.url" unless we invalidate the cache first. I still don't like that the invalidation is done in git_config_set, though, as this is also used to write completely unrelated files. Wouldn't it be better to have a 'git_config_refresh()' that could be used in place of (or before) current 'git_config(callback)' calls? The initial implementation could just invalidate the config cache. If there's time and energy to spare, a more advanced version could first check if any of the involved config files has changed. The xstrdup() problem could be solved by interning strings (see the attached patch for a trivial implementation). I.e. allocate each distinct string only once (and keep it allocated). So if there are 100 instances of "true" in the config file, the interned string pool would contain only one instance (i.e. 5 bytes + hashmap_entry + malloc overhead, vs. 100 * (5 bytes + malloc overhead) in case of xstrdup()). If the config cache is cleared, the interned string in the pool would still remain valid. If the config cache is reloaded, unmodified values would reuse the existing strings from the pool. Side note: interning getenv() results would fix the non-POSIX-compliant getenv()-usage in git [1]. [1] https://groups.google.com/d/msg/msysgit/4cVWWJkN4to/DR8FGTQ09Q0J ---- 8< ---- Subject: [PATCH] hashmap: add string interning API Interning short strings that are likely to have multiple copies may reduce memory footprint and speed up string comparisons. Add a strintern() API that uses a hashmap to manage the pool of interned strings. Signed-off-by: Karsten Blees <blees@xxxxxxx> --- Documentation/technical/api-hashmap.txt | 11 +++++++++++ hashmap.c | 34 +++++++++++++++++++++++++++++++++ hashmap.h | 4 ++++ t/t0011-hashmap.sh | 13 +++++++++++++ test-hashmap.c | 14 ++++++++++++++ 5 files changed, 76 insertions(+) diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt index b977ae8..db7c955 100644 --- a/Documentation/technical/api-hashmap.txt +++ b/Documentation/technical/api-hashmap.txt @@ -162,6 +162,17 @@ more entries. `hashmap_iter_first` is a combination of both (i.e. initializes the iterator and returns the first entry, if any). +`const char *strintern(const char *string)`:: + + Returns the unique, interned version of the specified string, similar + to the `String.intern` API in Java and .NET. Interned strings remain + valid for the entire lifetime of the process. ++ +Can be used as `[x]strdup()` replacement, except that the interned string must +not be modified or freed. ++ +Uses a hashmap to store the pool of interned strings. + Usage example ------------- diff --git a/hashmap.c b/hashmap.c index d1b8056..03cd8f3 100644 --- a/hashmap.c +++ b/hashmap.c @@ -226,3 +226,37 @@ void *hashmap_iter_next(struct hashmap_iter *iter) current = iter->map->table[iter->tablepos++]; } } + +struct string_pool_entry { + struct hashmap_entry ent; + char key[FLEX_ARRAY]; +}; + +static int string_pool_cmp(const struct string_pool_entry *e1, + const struct string_pool_entry *e2, + const char *key) +{ + return strcmp(e1->key, key ? key : e2->key); +} + +const char *strintern(const char *string) +{ + static struct hashmap map; + struct string_pool_entry key, *e; + /* initialize string pool hashmap */ + if (!map.tablesize) + hashmap_init(&map, (hashmap_cmp_fn) string_pool_cmp, 0); + + /* lookup interned string in pool */ + hashmap_entry_init(&key, strhash(string)); + e = hashmap_get(&map, &key, string); + if (!e) { + /* not found: create it */ + int len = strlen(string); + e = xmalloc(sizeof(struct string_pool_entry) + len + 1); + hashmap_entry_init(e, key.ent.hash); + memcpy(e->key, string, len + 1); + hashmap_add(&map, e); + } + return e->key; +} diff --git a/hashmap.h b/hashmap.h index a816ad4..b428677 100644 --- a/hashmap.h +++ b/hashmap.h @@ -68,4 +68,8 @@ static inline void *hashmap_iter_first(struct hashmap *map, return hashmap_iter_next(iter); } +/* string interning */ + +extern const char *strintern(const char *string); + #endif diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh index 391e2b6..f97c805 100755 --- a/t/t0011-hashmap.sh +++ b/t/t0011-hashmap.sh @@ -237,4 +237,17 @@ test_expect_success 'grow / shrink' ' ' +test_expect_success 'string interning' ' + +test_hashmap "intern value1 +intern Value1 +intern value2 +intern value2 +" "value1 +Value1 +value2 +value2" + +' + test_done diff --git a/test-hashmap.c b/test-hashmap.c index f5183fb..116cbb5 100644 --- a/test-hashmap.c +++ b/test-hashmap.c @@ -239,6 +239,20 @@ int main(int argc, char *argv[]) /* print table sizes */ printf("%u %u\n", map.tablesize, map.size); + } else if (!strcmp("intern", cmd) && l1) { + + /* test that strintern works */ + const char *i1 = strintern(p1); + const char *i2 = strintern(p1); + if (strcmp(i1, p1)) + printf("strintern(%s) returns %s\n", p1, i1); + else if (i1 == p1) + printf("strintern(%s) returns input pointer\n", p1); + else if (i1 != i2) + printf("strintern(%s) != strintern(%s)", i1, i2); + else + printf("%s\n", i1); + } else if (!strcmp("perfhashmap", cmd) && l1 && l2) { perf_hashmap(atoi(p1), atoi(p2)); -- 2.0.0.9649.g1eafa1f.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html