Re: [PATCH] teach fast-export an --anonymize option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 21, 2014 at 01:15:10PM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > +/*
> > + * We anonymize each component of a path individually,
> > + * so that paths a/b and a/c will share a common root.
> > + * The paths are cached via anonymize_mem so that repeated
> > + * lookups for "a" will yield the same value.
> > + */
> > +static void anonymize_path(struct strbuf *out, const char *path,
> > +			   struct hashmap *map,
> > +			   char *(*generate)(const char *, size_t *))
> > +{
> > +	while (*path) {
> > +		const char *end_of_component = strchrnul(path, '/');
> > +		size_t len = end_of_component - path;
> > +		const char *c = anonymize_mem(map, generate, path, &len);
> > +		strbuf_add(out, c, len);
> > +		path = end_of_component;
> > +		if (*path)
> > +			strbuf_addch(out, *path++);
> > +	}
> > +}
> 
> Do two paths sort the same way before and after anonymisation?  For
> example, if generate() works as a simple substitution, it should map
> a character that sorts before (or after) '/' with another that also
> sorts before (or after) '/' for us to be able to diagnose an error
> that comes from D/F sort order confusion.

No, the sort order is totally lost. I'd be afraid that a general scheme
would end up leaking information about what was in the filenames. It
might be acceptable to leak some information here, though, if it adds to
the realism of the result.

I tried here to lay the basic infrastructure and do the simplest thing
that might work, so we could evaluate proposals like that independently
(and also because I didn't come up with a clever enough algorithm to do
what you're asking).  Patches welcome on top. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]