[PATCH v2 4/4] bundle v3: the beginning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The bundle v3 format introduces an ability to have the bundle header
(which describes what references in the bundled history can be
fetched, and what objects the receiving repository must have in
order to unbundle it successfully) in one file, and the bundled pack
stream data in a separate file.

A v3 bundle file begins with a line with "# v3 git bundle", followed
by zero or more "extended header" lines, and an empty line, finally
followed by the list of prerequisites and references in the same
format as v2 bundle.  If it uses the "split bundle" feature, there
is a "data: $URL" extended header line, and nothing follows the list
of prerequisites and references.  Also, "sha1: " extended header
line may exist to help validating that the pack stream data matches
the bundle header.

A typical expected use of a split bundle is to help initial clone
that involves a huge data transfer, and would go like this:

 - Any repository people would clone and fetch from would regularly
   be repacked, and it is expected that there would be a packfile
   without prerequisites that holds all (or at least most) of the
   history of it (call it pack-$name.pack).

 - After arranging that packfile to be downloadable over popular
   transfer methods used for serving static files (such as HTTP or
   HTTPS) that are easily resumable as $URL/pack-$name.pack, a v3
   bundle file (call it $name.bndl) can be prepared with an extended
   header "data: $URL/pack-$name.pack" to point at the download
   location for the packfile, and be served at "$URL/$name.bndl".

 - An updated Git client, when trying to "git clone" from such a
   repository, may be redirected to $URL/$name.bndl", which would be
   a tiny text file (when split bundle feature is used).

 - The client would then inspect the downloaded $name.bndl, learn
   that the corresponding packfile exists at $URL/pack-$name.pack,
   and downloads it as pack-$name.pack, until the download succeeds.
   This can easily be done with "wget --continue" equivalent over an
   unreliable link.  The checksum recorded on the "sha1: " header
   line is expected to be used by this downloader (not written yet).

 - After fully downloading $name.bndl and pack-$name.pack and
   storing them next to each other, the client would clone from the
   $name.bndl; this would populate the newly created repository with
   reasonably recent history.

 - Then the client can issue "git fetch" against the original
   repository to obtain the most recent part of the history created
   since the bundle was made.

Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---
 bundle.c          | 103 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 bundle.h          |   3 ++
 t/t5704-bundle.sh |  64 +++++++++++++++++++++++++++++++++
 3 files changed, 161 insertions(+), 9 deletions(-)

diff --git a/bundle.c b/bundle.c
index 32bdb01..480630d 100644
--- a/bundle.c
+++ b/bundle.c
@@ -10,7 +10,8 @@
 #include "refs.h"
 #include "argv-array.h"
 
-static const char bundle_signature[] = "# v2 git bundle\n";
+static const char bundle_signature_v2[] = "# v2 git bundle\n";
+static const char bundle_signature_v3[] = "# v3 git bundle\n";
 
 static void add_to_ref_list(const unsigned char *sha1, const char *name,
 		struct ref_list *list)
@@ -33,16 +34,55 @@ static int parse_bundle_header(int fd, struct bundle_header *header, int quiet)
 	int status = 0;
 
 	/* The bundle header begins with the signature */
-	if (strbuf_getwholeline_fd(&buf, fd, '\n') ||
-	    strcmp(buf.buf, bundle_signature)) {
+	if (strbuf_getwholeline_fd(&buf, fd, '\n')) {
+	bad_bundle:
 		if (!quiet)
-			error(_("'%s' does not look like a v2 bundle file"),
+			error(_("'%s' does not look like a supported bundle file"),
 			      header->filename);
 		status = -1;
 		goto abort;
 	}
 
-	/* The bundle header ends with an empty line */
+	if (!strcmp(buf.buf, bundle_signature_v2))
+		header->bundle_version = 2;
+	else if (!strcmp(buf.buf, bundle_signature_v3))
+		header->bundle_version = 3;
+	else
+		goto bad_bundle;
+
+	if (header->bundle_version == 3) {
+		/*
+		 * bundle version v3 has extended headers before the
+		 * list of prerequisites and references.  The extended
+		 * headers end with an empty line.
+		 */
+		while (!strbuf_getwholeline_fd(&buf, fd, '\n')) {
+			const char *cp;
+			if (buf.len && buf.buf[buf.len - 1] == '\n')
+				buf.buf[--buf.len] = '\0';
+			if (!buf.len)
+				break;
+			if (skip_prefix(buf.buf, "data: ", &cp)) {
+				header->datafile = xstrdup(cp);
+				continue;
+			}
+			if (skip_prefix(buf.buf, "sha1: ", &cp)) {
+				unsigned char sha1[GIT_SHA1_RAWSZ];
+				if (get_sha1_hex(cp, sha1) ||
+				    cp[GIT_SHA1_HEXSZ])
+					goto bad_bundle;
+				hashcpy(header->csum, sha1);
+				continue;
+			}
+
+			goto bad_bundle;
+		}
+	}
+
+	/*
+	 * The bundle header lists prerequisites and
+	 * references, and the list ends with an empty line.
+	 */
 	while (!strbuf_getwholeline_fd(&buf, fd, '\n') &&
 	       buf.len && buf.buf[0] != '\n') {
 		unsigned char sha1[20];
@@ -77,7 +117,8 @@ static int parse_bundle_header(int fd, struct bundle_header *header, int quiet)
 
  abort:
 	if (status) {
-		close(fd);
+		if (0 <= fd)
+			close(fd);
 		fd = -1;
 	}
 	strbuf_release(&buf);
@@ -426,6 +467,7 @@ int create_bundle(struct bundle_header *header, const char *path,
 	int bundle_to_stdout;
 	int ref_count = 0;
 	struct rev_info revs;
+	const char *bundle_signature = bundle_signature_v2;
 
 	bundle_to_stdout = !strcmp(path, "-");
 	if (bundle_to_stdout)
@@ -480,22 +522,65 @@ int create_bundle(struct bundle_header *header, const char *path,
 	return 0;
 }
 
+/*
+ * v3 "split bundle" allows a separate packfile to be named
+ * as "data: $URL/$name_of_the_packfile".  This file is expected
+ * to be downloaded next to the bundle header file when the
+ * bundle is used.  Hence we find the path to the directory
+ * that contains the bundle header file, and append the basename
+ * part of the bundle_data_file to it, to form the name of the
+ * file that holds the pack data stream.
+ */
+static int open_bundle_data(struct bundle_header *header)
+{
+	const char *cp;
+	struct strbuf filename = STRBUF_INIT;
+	int fd;
+
+	assert(header->datafile);
+
+	cp = find_last_dir_sep(header->filename);
+	if (cp)
+		strbuf_add(&filename, header->filename,
+			   (cp - header->filename) + 1);
+	cp = find_last_dir_sep(header->datafile);
+	if (!cp)
+		cp = header->datafile;
+	strbuf_addstr(&filename, cp);
+
+	fd = open(filename.buf, O_RDONLY);
+	strbuf_release(&filename);
+	return fd;
+}
+
 int unbundle(struct bundle_header *header, int bundle_fd, int flags)
 {
 	const char *argv_index_pack[] = {"index-pack",
 					 "--fix-thin", "--stdin", NULL, NULL};
 	struct child_process ip = CHILD_PROCESS_INIT;
+	int status = 0, data_fd = -1;
 
 	if (flags & BUNDLE_VERBOSE)
 		argv_index_pack[3] = "-v";
 
 	if (verify_bundle(header, 0))
 		return -1;
+
+	if (header->datafile) {
+		data_fd = open_bundle_data(header);
+		if (data_fd < 0)
+			return error(_("bundle data not found"));
+		ip.in = data_fd;
+	} else {
+		ip.in = bundle_fd;
+	}
+
 	ip.argv = argv_index_pack;
-	ip.in = bundle_fd;
 	ip.no_stdout = 1;
 	ip.git_cmd = 1;
 	if (run_command(&ip))
-		return error(_("index-pack died"));
-	return 0;
+		status = error(_("index-pack died"));
+	if (0 <= data_fd)
+		close(data_fd);
+	return status;
 }
diff --git a/bundle.h b/bundle.h
index e059ccf..db55dc7 100644
--- a/bundle.h
+++ b/bundle.h
@@ -10,7 +10,10 @@ struct ref_list {
 };
 
 struct bundle_header {
+	int bundle_version;
 	const char *filename;
+	const char *datafile;
+	unsigned char csum[GIT_SHA1_RAWSZ];
 	struct ref_list prerequisites;
 	struct ref_list references;
 };
diff --git a/t/t5704-bundle.sh b/t/t5704-bundle.sh
index 348d9b3..e68523c 100755
--- a/t/t5704-bundle.sh
+++ b/t/t5704-bundle.sh
@@ -71,4 +71,68 @@ test_expect_success 'prerequisites with an empty commit message' '
 	git bundle verify bundle
 '
 
+# bundle v3 (experimental)
+test_expect_success 'clone from v3' '
+
+	# as "bundle create" does not exist yet for v3
+	# prepare it by hand here
+	head=$(git rev-parse HEAD) &&
+	name=$(echo $head | git pack-objects --revs v3) &&
+	test_when_finished "rm v3-$name.pack v3-$name.idx" &&
+	cat >v3.bndl <<-EOF &&
+	# v3 git bundle
+	data: v3-$name.pack
+
+	$head HEAD
+	$head refs/heads/master
+	EOF
+
+	git bundle verify v3.bndl &&
+	git bundle list-heads v3.bndl >actual &&
+	cat >expect <<-EOF &&
+	$head HEAD
+	$head refs/heads/master
+	EOF
+	test_cmp expect actual &&
+
+	git clone v3.bndl v3dst &&
+	git -C v3dst for-each-ref --format="%(objectname) %(refname)" >actual &&
+	cat >expect <<-EOF &&
+	$head refs/heads/master
+	$head refs/remotes/origin/HEAD
+	$head refs/remotes/origin/master
+	EOF
+	test_cmp expect actual &&
+	git -C v3dst fsck &&
+
+	# an "inline" v3 is still possible.
+	cat >v3i.bndl <<-EOF &&
+	# v3 git bundle
+
+	$head HEAD
+	$head refs/heads/master
+
+	EOF
+	cat v3-$name.pack >>v3i.bndl &&
+	test_when_finished "rm v3i.bndl" &&
+
+	git bundle verify v3i.bndl &&
+	git bundle list-heads v3i.bndl >actual &&
+	cat >expect <<-EOF &&
+	$head HEAD
+	$head refs/heads/master
+	EOF
+	test_cmp expect actual &&
+
+	git clone v3i.bndl v3idst &&
+	git -C v3idst for-each-ref --format="%(objectname) %(refname)" >actual &&
+	cat >expect <<-EOF &&
+	$head refs/heads/master
+	$head refs/remotes/origin/HEAD
+	$head refs/remotes/origin/master
+	EOF
+	test_cmp expect actual &&
+	git -C v3idst fsck
+'
+
 test_done
-- 
2.8.0-rc0-114-g0b3e5e5

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]