Re: [PATCH v2] teach fast-export an --anonymize option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 22, 2014 at 07:39:59PM +0100, Philip Oakley wrote:

> Just a bit of bikeshedding for future improvements..
> 
> The .gitignore is another potential user problem area that may benefit form
> not being anonymised when problems strike.

Thanks, I had meant to mention some implications for .gitmodules here,
but forgot about .gitignore (and .gitattributes!).

For any git-specific files like this, we have two challenges:

  1. We've munged their filenames (so .gitignore is probably path123
     now).

  2. We'll have munged their contents. So even if we left the file as
     .gitignore, it will have junk in it.

Fixing (1) is pretty easy. I structured all of the anonymizing functions
to take the old values, even though most of them just throw it away
entirely (which is a good way to be sure you're not leaking anything!).
But we could pass through a few specific ones.

However, that doesn't help us if the contents are still munged (in fact
it's worse, because git will be annoyed that your .gitmodules file
contains unparseable crap). So how do we munge those files?

It depends on the individual file, I think, and what the user wants to
protect.

For .gitignore and .gitattributes, we can translate the pathnames
contained in the file. But that doesn't work in the general case,
because the file could have wildcards or other non-literal syntax.

For .gitmodules, I think it's all-or-nothing. Either the user is OK
sharing the URLs of their submodules or not (we could munge _just_ the
URLs, but it's not like the result would be remotely functional).

So while we might be able to get some things working on the .gitignore
side, I kind of think the simplest way forward is just adding finer
granularity for the user. Let them say "my filenames are OK to share
because they're part of the problem, but just make sure you hide my
commit messages and file contents".

And then if you're not munging filenames, we would turn off .gitignore
and .gitattributes munging. The implementation is not too hard.
export_blob does not have the path of the blob, but we generate the list
of blobs to export from a diff, so we can feed the path that way.  That
technically misses a case where you have a blob at path "X", we
anonymize it, and then you later move it to ".gitignore", which would
not be anonymized. But that is unlikely enough that it is probably not
worth worrying about.

> For example, there's a current
> problem on the git-users list
> https://groups.google.com/forum/#!topic/git-users/JJFIEsI5HRQ about "git
> clean vs git status re .gitignore", which would then also beg questions
> about retaining file extensions/suffixes (.txt, .o, .c, etc).

Yeah, I think retaining extensions would be a reasonable option (and you
would probably use it with an option to retain .gitattributes or
.gitignore whole if you were confident that those files did not have
anything private and just used extension wildcards).

> One thought is that the user should be able to, as an option, select the
> number of initial characters retained from filenames, and similarly, the
> option to retain the file extension, and possibly directory names, such that
> the full .gitignore still works in most cases, and the sort order works (as
> far as it goes on number of characters).

Yeah, those all seem reasonable.

> All things for future improvers to consider.

Agreed. I wanted to go through your list not because I want to implement
any of those things right now, but because I wanted to make sure that
there was nothing in my approach that would preclude us from building
those things later. And I don't think there is (and I'd be happy if
somebody else felt like building them on top, now or later).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]