Re: [PATCH v2] teach fast-export an --anonymize option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
> new file mode 100755
> index 0000000..f76ffe4
> --- /dev/null
> +++ b/t/t9351-fast-export-anonymize.sh
> @@ -0,0 +1,117 @@
> +#!/bin/sh
> +
> +test_description='basic tests for fast-export --anonymize'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup simple repo' '
> +	test_commit base &&
> +	test_commit foo &&
> +	git checkout -b other HEAD^ &&
> +	mkdir subdir &&
> +	test_commit subdir/bar &&
> +	test_commit subdir/xyzzy &&
> +	git tag -m "annotated tag" mytag
> +'
> +
> +test_expect_success 'export anonymized stream' '
> +	git fast-export --anonymize --all >stream
> +'
> +
> +# this also covers commit messages
> +test_expect_success 'stream omits path names' '
> +	! fgrep base stream &&
> +	! fgrep foo stream &&
> +	! fgrep subdir stream &&
> +	! fgrep bar stream &&
> +	! fgrep xyzzy stream
> +'

I know there are a few isolated places that already use "fgrep", but
let's not spread the disease. Neither "fgrep" nor "egrep" appears in
POSIX and they can easily be spelled more portably as "grep -F" and
"grep -E", respectively.

> +test_expect_success 'stream allows master as refname' '
> +	fgrep master stream
> +'
> +
> +test_expect_success 'stream omits other refnames' '
> +	! fgrep other stream
> +'

What should happen to mytag?

> +
> +test_expect_success 'stream omits identities' '
> +	! fgrep "$GIT_COMMITTER_NAME" stream &&
> +	! fgrep "$GIT_COMMITTER_EMAIL" stream &&
> +	! fgrep "$GIT_AUTHOR_NAME" stream &&
> +	! fgrep "$GIT_AUTHOR_EMAIL" stream
> +'
> +
> +test_expect_success 'stream omits tag message' '
> +	! fgrep "annotated tag" stream
> +'
> +
> +# NOTE: we chdir to the new, anonymized repository
> +# after this. All further tests should assume this.
> +test_expect_success 'import stream to new repository' '
> +	git init new &&
> +	cd new &&
> +	git fast-import <../stream
> +'
> +
> +test_expect_success 'result has two branches' '
> +	git for-each-ref --format="%(refname)" refs/heads >branches &&
> +	test_line_count = 2 branches &&
> +	other_branch=$(grep -v refs/heads/master branches)
> +'
> +
> +test_expect_success 'repo has original shape' '
> +	cat >expect <<-\EOF &&
> +	> subject 3
> +	> subject 2
> +	< subject 1
> +	- subject 0
> +	EOF
> +	git log --format="%m %s" --left-right --boundary \
> +		master...$other_branch >actual &&
> +	test_cmp expect actual
> +'

Yuck and Hmph.  Doing a shape-preserving conversion is very
important, but I wonder if we can we verify without having to cast a
particular rewrite rule in stone.  We know we want to preserve
relative order of committer timestamps (to reproduce bugs that
depend on the traversal order), and it may be OK to reuse the
exactly the same committer timestamps from the original, in which
case we can make sure that we create the original history with
appropriate "test_tick"s (I think test_commit does that for us) and
use "%ct" instead of "%s" here, perhaps?  That way we can later
change the rewrite rules of commit object payload without having to
adjust this test.

> +
> +test_expect_success 'root tree has original shape' '
> +	cat >expect <<-\EOF &&
> +	blob
> +	tree
> +	EOF
> +	git ls-tree $other_branch >root &&
> +	cut -d" " -f2 <root >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'paths in subdir ended up in one tree' '
> +	cat >expect <<-\EOF &&
> +	blob
> +	blob
> +	EOF
> +	tree=$(grep tree root | cut -f2) &&
> +	git ls-tree $other_branch:$tree >tree &&
> +	cut -d" " -f2 <tree >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'tag points to branch tip' '
> +	git rev-parse $other_branch >expect &&
> +	git for-each-ref --format="%(*objectname)" | grep . >actual &&
> +	test_cmp expect actual
> +'

I notice you haven't checked how many tags you have in the
repository, unlike the number of branches which you counted
earlier.

> +test_expect_success 'idents are shared' '
> +	git log --all --format="%an <%ae>" >authors &&
> +	sort -u authors >unique &&
> +	test_line_count = 1 unique &&
> +	git log --all --format="%cn <%ce>" >committers &&
> +	sort -u committers >unique &&
> +	test_line_count = 1 unique &&
> +	! test_cmp authors committers
> +'

Two commits by the same author must convert to two commits by the
same anonymized author, but that is not tested here; the history
made in 'setup a simple repo' step is a bit too simple to do that
anyway, though ;-).

> +test_expect_success 'commit timestamps are retained' '
> +	git log --all --format="%ct" >timestamps &&
> +	sort -u timestamps >unique &&
> +	test_line_count = 4 unique
> +'
> +
> +test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]