On Thu, Aug 21, 2014 at 01:15:10PM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > +/* > > + * We anonymize each component of a path individually, > > + * so that paths a/b and a/c will share a common root. > > + * The paths are cached via anonymize_mem so that repeated > > + * lookups for "a" will yield the same value. > > + */ > > +static void anonymize_path(struct strbuf *out, const char *path, > > + struct hashmap *map, > > + char *(*generate)(const char *, size_t *)) > > +{ > > + while (*path) { > > + const char *end_of_component = strchrnul(path, '/'); > > + size_t len = end_of_component - path; > > + const char *c = anonymize_mem(map, generate, path, &len); > > + strbuf_add(out, c, len); > > + path = end_of_component; > > + if (*path) > > + strbuf_addch(out, *path++); > > + } > > +} > > Do two paths sort the same way before and after anonymisation? For > example, if generate() works as a simple substitution, it should map > a character that sorts before (or after) '/' with another that also > sorts before (or after) '/' for us to be able to diagnose an error > that comes from D/F sort order confusion. No, the sort order is totally lost. I'd be afraid that a general scheme would end up leaking information about what was in the filenames. It might be acceptable to leak some information here, though, if it adds to the realism of the result. I tried here to lay the basic infrastructure and do the simplest thing that might work, so we could evaluate proposals like that independently (and also because I didn't come up with a clever enough algorithm to do what you're asking). Patches welcome on top. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html