Re: [PATCH 2/3] git-bundle: die if a given ref is not included in bundle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin wrote:
git-bundle (as is in "next") has clearly defined semantics.
git-bundle on next with the patch in <Pine.LNX.4.63.0703091726530.22628@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> works well enough for me, but absent that latter patch is too punishing.
But the bundle concept is not thought through. Obviously, the clearly defined semantics of git-bundle do not match what people want to use bundles for.

I've been wondering if we can define prereqs per listed head.
Currently, a bundle has a single pack. The bundle's prerequisites are that pack's dependencies. Splitting prerequisites per head requires either creating one pack per head or unpacking at the receiving and and extracting only the objects needed for the selected head. I'm not sure that either is warranted, and my uses of bundle do not require this regardless.

I think we need to restate purpose here. git-bundle is an alternate transport mechanism: git-bundle + git-fetch over sneakernet allows doing what git-push or git-fetch can do when directly connected. However, there are limitations due to the lack of direct connect, specifically the user of git-bundle needs to specify the prerequisites as the protocol cannot negotiate these. The exchange needs to be robust in that git-bundle+git-fetch must never result in leaving a repository in a corrupted state: the current prerequisites list + use of git-fetch seem to satisfy this.

From a given connected repo, I can do:

   git fetch -f <source url> master:master

and nothing complains, even if no update occurs (remote master is up to date). I can also do
       git fetch -f <source url> master:next

and the new ref is created without complaint even if no new objects need to be defined or if the new definition is completely unrelated to the old.

With appropriate remote settings in .git/config, I can have git-fetch get all branches, or all branches and tags, and never complain when no update is required for something.

What I desire is similar functionality across sneakernet, and this is where git-bundle steps in. I cannot know what is on the destination repository exactly, so I need an imprecise way to specify prerequisites (e.g., --since=10.days.ago): I *know* that this is not exactly correct and *must* be conservative so that the bundle likely can be used. As the system is distributed and I don't control the recipients of the bundle, there is *no* way to know exactly what exists, the previous bundle is not definitive, it might not have been applied, they might have received data by a side-channel communication, etc.

So, a date range is the best method I have found to specify a bundle's prerequisites. However, I should not have to know which refs have been updated within the date range: this is too punishing. Directly connected git-fetch does not abort when the refspec says "get everything" but some ref in "everything" has not changed, why should git-bundle complain? Absent the latest patch (i.e., what is now on next), git-bundle will error out which is extremely unfriendly and unhelpful.

The single disconnect for the above with latest git-bundle + patch is that we cannot package a ref whose commit object is directly a bundle prerequisite. (I cannot do the equivalent of "git fetch remote master:next", where I already have all the objects for master). These instead result in a string of warning messages with the latest patch: I can live with this limitation (though I don't think this should even be a warning, git-fetch/git-push do not warn here). Absent the latest patch, git-bundle errors out: this is too punishing to the user.

While it is possible to fetch a particular ref from the bundle rather than taking all, the monolithic pack structure and protocol dictates that you will get all objects regardless. I do not see this as a problem: the bundle came from a single repository, everything in the bundle is therefore related, excess is easily trimmed by git-repack. This is really just a limitation of the disconnected protocol that cannot optimize the pack for the exact transfer required.

At some point, we have to make a clear distinction between what rules the protocol should enforce for "correctness" vs what an "intelligent" use of bundle is, and not try to enforce the latter in the software. What practices are useful or good vary considerably from business to business (I have many times been told that things I find essential to my work are "bad practice," usually stated by people who didn't have to solve a problem given constraints I actually face). The only requests git-bundle/git-fetch should refuse are things that will corrupt a git-repository, and the pair should endeavor to enable any information transfer that can be done with git-push or git-fetch given direct connections.

Bottom line, I strongly advocate Dscho's last patch + what is on next be promoted to master. We can revisit how well it is working and refine it after it gets some usage from others defining additional use-cases.

Mark
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]