Re: [PATCH 09/10] fast-export: allow seeding the anonymized mapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 23, 2020 at 04:30:23PM -0400, Eric Sunshine wrote:

> > I'm not sure what you'd write, then. You can't mention "mybranch"
> > anymore if it was anonymized. Are you suggesting to make the example:
> >
> >  git rev-list -- foo.c
> >
> > by itself?
> 
> Sorry, I meant to provide an example like this:
> 
>     For example, if you have a bug which reproduces with `git rev-list
>     sensitive -- secret.c`, you can run:
> 
>     $ git fast-export --anonymize --all \
>         --seed-anonymized=sensitive:foo \
>         --seed-anonymized=secret.c:bar.c \
>         >stream
> 
>     After importing the stream, you can then run `git rev-list foo --
>     bar.c` in the anonymized repository.

Thanks, that makes sense. I took this as-is for my reroll (modulo the
change of option name discussed elsewhere).

> Hmm, perhaps your original attempt can be extended slightly to state
> it more explicitly?
> 
>     Note that paths and refnames are split into tokens at slash
>     boundaries. The command above would anonymize `subdir/foo.c` as
>     something like `path123/secret.c`; you could then search for
>     `secret.c` in the anonymized repository to determine the final
>     pathname.
> 
>     To make referencing the final pathname simpler, you can seed
>     anonymization for each path component; so, if you also anonymize
>     `subdir` to `publicdir`, then the final pathname would be
>     `publicdir/secret.c`.

Thanks, I took this modulo some fixups to match the example above, and
to avoid the use of the word "seed" based on our other discussion.

> This makes me wonder if --seed-anonymized should do its own
> tokenization so that --seed-anonymized=subdir/foo:public/bar is
> automatically understood as anonymizing "subdir" to "public" _and_
> "foo" to "bar". But that potentially gets weird if you say:
> 
>     --seed-anonymized=a/b:q/p --seed-anonymized=a/c:y/z
> 
> in which case you've given conflicting replacements for "a". (I
> suppose it could issue a warning message in that case.)

Right, I think you get into weird corner cases. Another issue is that
not all items are tokenized (e.g., if your author name was foo/bar,
you'd want that replaced as a whole). Probably you could add both the
broken-down and full inputs. Yet another issue is that you can't add a
token with a ":" due to the syntax.

This is an infrequently-enough-used feature that I think it's worth
keeping things simple, even if they're a little less convenient to
invoke.

> Lack of a warning or error could be kind of bad if the person doesn't
> check the fast-export file before sending it out and only discovers
> later that:
> 
>     git fast-export --seed-anonymized=foo:bar
> 
> didn't perform _any_ anonymization at all.

Good point. I'd hope people would glance at the output before sending it
out, but given that it's a potential safety issue, it probably is worth
detecting this case. I'll add it to my re-roll.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux