Johannes Schindelin wrote:
git-bundle (as is in "next") has clearly defined semantics.
git-bundle on next with the patch in
<Pine.LNX.4.63.0703091726530.22628@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
works well enough for me, but absent that latter patch is too punishing.
But the bundle concept is not thought through. Obviously, the clearly
defined semantics of git-bundle do not match what people want to use
bundles for.
I've been wondering if we can define prereqs per listed head.
Currently, a bundle has a single pack. The bundle's prerequisites are
that pack's dependencies. Splitting prerequisites per head requires
either creating one pack per head or unpacking at the receiving and and
extracting only the objects needed for the selected head. I'm not sure
that either is warranted, and my uses of bundle do not require this
regardless.
I think we need to restate purpose here. git-bundle is an alternate
transport mechanism: git-bundle + git-fetch over sneakernet allows doing
what git-push or git-fetch can do when directly connected. However,
there are limitations due to the lack of direct connect, specifically
the user of git-bundle needs to specify the prerequisites as the
protocol cannot negotiate these. The exchange needs to be robust in that
git-bundle+git-fetch must never result in leaving a repository in a
corrupted state: the current prerequisites list + use of git-fetch seem
to satisfy this.
From a given connected repo, I can do:
git fetch -f <source url> master:master
and nothing complains, even if no update occurs (remote master is up to
date). I can also do
git fetch -f <source url> master:next
and the new ref is created without complaint even if no new objects need
to be defined or if the new definition is completely unrelated to the old.
With appropriate remote settings in .git/config, I can have git-fetch
get all branches, or all branches and tags, and never complain when no
update is required for something.
What I desire is similar functionality across sneakernet, and this is
where git-bundle steps in. I cannot know what is on the destination
repository exactly, so I need an imprecise way to specify prerequisites
(e.g., --since=10.days.ago): I *know* that this is not exactly correct
and *must* be conservative so that the bundle likely can be used. As the
system is distributed and I don't control the recipients of the bundle,
there is *no* way to know exactly what exists, the previous bundle is
not definitive, it might not have been applied, they might have received
data by a side-channel communication, etc.
So, a date range is the best method I have found to specify a bundle's
prerequisites. However, I should not have to know which refs have been
updated within the date range: this is too punishing. Directly connected
git-fetch does not abort when the refspec says "get everything" but some
ref in "everything" has not changed, why should git-bundle complain?
Absent the latest patch (i.e., what is now on next), git-bundle will
error out which is extremely unfriendly and unhelpful.
The single disconnect for the above with latest git-bundle + patch is
that we cannot package a ref whose commit object is directly a bundle
prerequisite. (I cannot do the equivalent of "git fetch remote
master:next", where I already have all the objects for master). These
instead result in a string of warning messages with the latest patch: I
can live with this limitation (though I don't think this should even be
a warning, git-fetch/git-push do not warn here). Absent the latest
patch, git-bundle errors out: this is too punishing to the user.
While it is possible to fetch a particular ref from the bundle rather
than taking all, the monolithic pack structure and protocol dictates
that you will get all objects regardless. I do not see this as a
problem: the bundle came from a single repository, everything in the
bundle is therefore related, excess is easily trimmed by git-repack.
This is really just a limitation of the disconnected protocol that
cannot optimize the pack for the exact transfer required.
At some point, we have to make a clear distinction between what rules
the protocol should enforce for "correctness" vs what an "intelligent"
use of bundle is, and not try to enforce the latter in the software.
What practices are useful or good vary considerably from business to
business (I have many times been told that things I find essential to my
work are "bad practice," usually stated by people who didn't have to
solve a problem given constraints I actually face). The only requests
git-bundle/git-fetch should refuse are things that will corrupt a
git-repository, and the pair should endeavor to enable any information
transfer that can be done with git-push or git-fetch given direct
connections.
Bottom line, I strongly advocate Dscho's last patch + what is on next be
promoted to master. We can revisit how well it is working and refine it
after it gets some usage from others defining additional use-cases.
Mark
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html