Re: To graft or not to graft... (Re: Recovering from repository corruption)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday 12 June 2008, Linus Torvalds wrote:
> On Thu, 12 Jun 2008, Stephen R. van den Berg wrote:
> > As I understood it from the few shreds of documentation that actually
> > mention the grafts file, the grafts file is *not* being cloned.
> > Therefore, my assumption was that cloning a repository that has a
> > grafts file gives an identical result to cloning the same repository
> > *without* the grafts file present.
>
> That would probably be the right behaviour, but no - all our commit
> walkers honor the grafts file.
>
> Including the ones used for creating pack-files and thus a clone.
>
> > As I understand it now, the cloning process actually peeks at the
> > grafts file while cloning, and then doesn't copy it.  This results in a
> > rather confusingly corrupt clone.
>
> Yes. The grafts-file was a mistake, but it's just barely useful to some
> people that it's stayed alive. Sadly, those "some people" don't tend to
> care enough about the problems it can cause.
>
> > I suggest two things:
> > a. That during the cloning process, the grafts file is completely
> >    disregarded in any case at first.
>
> Yes.
>
> And (a'): git-fsck and repacking should just consider it to be an
> _additional_ source of parenthood rather than a _replacement_ source.
>
> > b. Preferably the grafts file is copied as well (after cloning).  I
> >    never really understood why the file is not being copied in the
> > first place (anyone care to explain that?).
>
> The grafts file isn't part of the object stream and refs, and clones (and
> fetches) very much just copy the object database.

AFAICS, there's already a perfectly fine way to distribute grafted history:
1. Add a grafts file
2. Run git-filter-branch
3. Remove grafts file
4. Distribute repo
5. Profit!

Since git-filter-branch turns grafted parentage into _real_ parentage,
there's no point in ever having a grafts file at all (except transiently
for telling git-filter-branch what to do).

I suggest we make commit walkers NOT obey the grafts file by default, but
instead require a --follow-grafts option to restore the current behaviour.
Then, we teach git-filter-branch to obey the grafts file (probably by
employing said --follow-grafts option).

For those who want to hang on to the current behaviour, they can create
some config option that is equivalent to always running with
--follow-grafts.


The following is ugly, untested, undocumented, and obviously unfit for
inclusion:


diff --git a/commit.c b/commit.c
index 94d5b3d..3e9ebf7 100644
--- a/commit.c
+++ b/commit.c
@@ -7,6 +7,7 @@
 #include "revision.h"
 
 int save_commit_buffer = 1;
+int use_grafts = 0;
 
 const char *commit_type = "commit";
 
@@ -242,7 +243,7 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size)
 	char *bufptr = buffer;
 	unsigned char parent[20];
 	struct commit_list **pptr;
-	struct commit_graft *graft;
+	struct commit_graft *graft = NULL;
 	unsigned n_refs = 0;
 
 	if (item->object.parsed)
@@ -260,7 +261,8 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size)
 	bufptr += 46; /* "tree " + "hex sha1" + "\n" */
 	pptr = &item->parents;
 
-	graft = lookup_commit_graft(item->object.sha1);
+	if (use_grafts)
+		graft = lookup_commit_graft(item->object.sha1);
 	while (bufptr + 48 < tail && !memcmp(bufptr, "parent ", 7)) {
 		struct commit *new_parent;
 
diff --git a/commit.h b/commit.h
index 2d94d41..3e30aa0 100644
--- a/commit.h
+++ b/commit.h
@@ -22,6 +22,7 @@ struct commit {
 };
 
 extern int save_commit_buffer;
+extern int use_grafts;
 extern const char *commit_type;
 
 /* While we can decorate any object with a name, it's only used for commits.. */
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index d04c346..5ebe7cd 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -230,11 +230,11 @@ mkdir ../map || die "Could not create map/ directory"
 case "$filter_subdir" in
 "")
 	git rev-list --reverse --topo-order --default HEAD \
-		--parents "$@"
+		--follow-grafts --parents "$@"
 	;;
 *)
 	git rev-list --reverse --topo-order --default HEAD \
-		--parents "$@" -- "$filter_subdir"
+		--follow-grafts --parents "$@" -- "$filter_subdir"
 esac > ../revs || die "Could not get the commits"
 commits=$(wc -l <../revs | tr -d " ")
 
diff --git a/revision.c b/revision.c
index 5a1a948..ca98815 100644
--- a/revision.c
+++ b/revision.c
@@ -1059,6 +1059,10 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
 				revs->first_parent_only = 1;
 				continue;
 			}
+			if (!strcmp(arg, "--follow-grafts")) {
+				use_grafts = 1;
+				continue;
+			}
 			if (!strcmp(arg, "--reflog")) {
 				handle_reflog(revs, flags);
 				continue;
-- 
1.5.6.rc2.128.gf64ae


Have fun! :)

...Johan

-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux