Re: [PATCH v5 25/27] refs: add LMDB refs storage backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Fri, 2016-02-19 at 09:54 +0700, Duy Nguyen wrote:
>> On Fri, Feb 19, 2016 at 3:23 AM, David Turner <
>> dturner@xxxxxxxxxxxxxxxx> wrote:
>> > > > +static int read_per_worktree_ref(const char *submodule, const
>> > > > char
>> > > > *refname,
>> > > > +                            struct MDB_val *val, int
>> > > > *needs_free)
>> > >
>> > > From what I read, I suspect these _per_worktree functions will be
>> > > identical for the next backend. Should we just hand over the job
>> > > for
>> > > files backend? For all entry points that may deal with per
>> > > -worktree
>> > > refs, e.g. lmdb_resolve_ref_unsafe, can we check ref_type() first
>> > > thing, if it's per-worktree we call
>> > > refs_be_files.resolve_ref_unsafe()
>> > > instead?  It could even be done at frontend level,
>> > > e.g. refs.c:resolve_ref_unsafe().
>> > >
>> > > Though I may be talking rubbish here because I don't know how
>> > > whether
>> > > it has anything to do with transactions.
>> >
>> > The reason I did it this way is that some ref chains cross backend
>> > boundaries (e.g. HEAD -> refs/heads/master).  But if we have other
>> > backends later, we could generalize.
>>
>> Crossing backends should go through frontend again, imo. But I don't
>> really know if it's efficient.
>
> It's pretty tricky to maintain state (e.g. count of symref redirects)
> across that barrier.  So I'm not sure how to do it cleanly.

I notice files backend does pretty much the same thing. "files"
backend looks more like two backends combined in one, one is files,
the other is packed-refs. And it looks like we could solve it by
providing a lower level api, read_raw_ref() or something, that
retrieves the ref without any validation or link following. More on
this later.

>> > > I'm not sure I get this comment. D/F conflicts are no longer a
>> > > thing
>> > > for lmdb backend, right?
>> >
>> > I'm trying to avoid the lmdb backend creating a set of refs that
>> > the
>> > files backend can't handle.  This would make collaboration with
>> > other
>> > versions of git more difficult.
>>
>> It already is. If you create refs "foo" and "FOO" on case sensitive
>> file system and clone it on a case-insensitive one, you face the same
>> problem. We may have an optional configuration knob to prevent
>> incompatibilities with files backend, but I think that should be done
>> (and enforced if necessary) outside backends.
>
> Sure, the current state isn't perfect, but why make it worse?

I see it from a different angle. The current state isn't perfect, but
we will be moving to a better future where "files" backend may
eventually be deprecated. Why hold back?

But this line of reasoning only works if we have a new backend capable
of replacing "files" without regressions or introducing new
dependency. Which is why I suggest a new backend [1] (or implement
Shawn's RefTree if it's proven as good with small repos)

I have no problem if you want to stay strictly compatible with "files"
though.

[1] http://thread.gmane.org/gmane.comp.version-control.git/285893/focus=286654

On Fri, Feb 19, 2016 at 05:49:46PM -0500, David Turner wrote:
> > 
> > This code looks a lot like near the end of resolve_ref_1(). Maybe we
> > could share the code in refs/backend-common.c or something and call
> > here instead?
> 
> Something like the following?
> 
> commit aad6b84fd1869f6e1cf6ed15bcece0c2f6429e9d
> Author: David Turner <dturner@xxxxxxxxxxxxxxxx>
> Date:   Thu Feb 18 17:09:29 2016 -0500
> 
>     refs: break out some functions from resolve_ref_1
>     
>     A bunch of resolve_ref_1 is not backend-specific, so we can
>     break it out into separate internal functions that other
>     backends can use.
...
> 
> I'm not sure I like it, because it breaks out these weird tiny
> functions that take a lot of arguments.  But maybe it's worth it?  What
> do you think?

OK how about we keep resolve_ref_1() whole and split real backend code
out? Something like these three patches (only built, did not test). A
bit ugly with continue_symlink, but it's just demonstration.

commit ef46fcdc62ef89fd5260ca054cd1d98f9f2d7c2b
Author: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
Date:   Sat Feb 20 09:18:45 2016 +0700

    refs/files: move ref I/O code out of resolve_refs_1()

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 4bddfb3..f54f2ae 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1407,6 +1407,95 @@ static int resolve_missing_loose_ref(const char *refname,
 	}
 }
 
+static const char *continue_normal_ref = "read_ref returns a normal ref";
+static const char *continue_symlink = "read_ref returns a symlink";
+
+/*
+ * Read a ref from backend. Returning any values except
+ * continue_normal_ref or continue_symlink ends resolve_ref_1()
+ * execution. If the return value is not NULL, sha1 and flags must be
+ * updated correctly. except REF_ISBROKEN which is set by
+ * resolve_ref_1().
+ *
+ * If continue_* is returned, sb_contents must contain the ref data.
+ */
+static const char *parse_ref(const char *refname,
+			     int resolve_flags,
+			     unsigned char *sha1,
+			     int *flags,
+			     struct strbuf *sb_path,
+			     struct strbuf *sb_contents)
+{
+	const char *path;
+	struct stat st;
+	int fd;
+
+	strbuf_reset(sb_path);
+	strbuf_git_path(sb_path, "%s", refname);
+	path = sb_path->buf;
+
+	/*
+	 * We might have to loop back here to avoid a race
+	 * condition: first we lstat() the file, then we try
+	 * to read it as a link or as a file.  But if somebody
+	 * changes the type of the file (file <-> directory
+	 * <-> symlink) between the lstat() and reading, then
+	 * we don't want to report that as an error but rather
+	 * try again starting with the lstat().
+	 */
+stat_ref:
+	if (lstat(path, &st) < 0) {
+		if (errno != ENOENT)
+			return NULL;
+		if (resolve_missing_loose_ref(refname, resolve_flags,
+					      sha1, flags))
+			return NULL;
+		return refname;
+	}
+
+	/* Follow "normalized" - ie "refs/.." symlinks by hand */
+	if (S_ISLNK(st.st_mode)) {
+		strbuf_reset(sb_contents);
+		if (strbuf_readlink(sb_contents, path, 0) < 0) {
+			if (errno == ENOENT || errno == EINVAL)
+				/* inconsistent with lstat; retry */
+				goto stat_ref;
+			else
+				return NULL;
+		}
+		return continue_symlink;
+	}
+
+	/* Is it a directory? */
+	if (S_ISDIR(st.st_mode)) {
+		errno = EISDIR;
+		return NULL;
+	}
+
+	/*
+	 * Anything else, just open it and try to use it as
+	 * a ref
+	 */
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT)
+			/* inconsistent with lstat; retry */
+			goto stat_ref;
+		else
+			return NULL;
+	}
+	strbuf_reset(sb_contents);
+	if (strbuf_read(sb_contents, fd, 256) < 0) {
+		int save_errno = errno;
+		close(fd);
+		errno = save_errno;
+		return NULL;
+	}
+	close(fd);
+
+	return continue_normal_ref;
+}
+
 /* This function needs to return a meaningful errno on failure */
 static const char *resolve_ref_1(const char *refname,
 				 int resolve_flags,
@@ -1442,54 +1531,18 @@ static const char *resolve_ref_1(const char *refname,
 		bad_name = 1;
 	}
 	for (;;) {
-		const char *path;
-		struct stat st;
+		const char *ret;
 		char *buf;
-		int fd;
 
 		if (--depth < 0) {
 			errno = ELOOP;
 			return NULL;
 		}
 
-		strbuf_reset(sb_path);
-		strbuf_git_path(sb_path, "%s", refname);
-		path = sb_path->buf;
+		ret = parse_ref(refname, resolve_flags, sha1,
+				flags, sb_path, sb_contents);
 
-		/*
-		 * We might have to loop back here to avoid a race
-		 * condition: first we lstat() the file, then we try
-		 * to read it as a link or as a file.  But if somebody
-		 * changes the type of the file (file <-> directory
-		 * <-> symlink) between the lstat() and reading, then
-		 * we don't want to report that as an error but rather
-		 * try again starting with the lstat().
-		 */
-	stat_ref:
-		if (lstat(path, &st) < 0) {
-			if (errno != ENOENT)
-				return NULL;
-			if (resolve_missing_loose_ref(refname, resolve_flags,
-						      sha1, flags))
-				return NULL;
-			if (bad_name) {
-				hashclr(sha1);
-				if (flags)
-					*flags |= REF_ISBROKEN;
-			}
-			return refname;
-		}
-
-		/* Follow "normalized" - ie "refs/.." symlinks by hand */
-		if (S_ISLNK(st.st_mode)) {
-			strbuf_reset(sb_contents);
-			if (strbuf_readlink(sb_contents, path, 0) < 0) {
-				if (errno == ENOENT || errno == EINVAL)
-					/* inconsistent with lstat; retry */
-					goto stat_ref;
-				else
-					return NULL;
-			}
+		if (ret == continue_symlink) {
 			if (starts_with(sb_contents->buf, "refs/") &&
 			    !check_refname_format(sb_contents->buf, 0)) {
 				strbuf_swap(sb_refname, sb_contents);
@@ -1502,35 +1555,10 @@ static const char *resolve_ref_1(const char *refname,
 				}
 				continue;
 			}
-		}
-
-		/* Is it a directory? */
-		if (S_ISDIR(st.st_mode)) {
-			errno = EISDIR;
-			return NULL;
-		}
-
-		/*
-		 * Anything else, just open it and try to use it as
-		 * a ref
-		 */
-		fd = open(path, O_RDONLY);
-		if (fd < 0) {
-			if (errno == ENOENT)
-				/* inconsistent with lstat; retry */
-				goto stat_ref;
-			else
-				return NULL;
-		}
-		strbuf_reset(sb_contents);
-		if (strbuf_read(sb_contents, fd, 256) < 0) {
-			int save_errno = errno;
-			close(fd);
-			errno = save_errno;
-			return NULL;
-		}
-		close(fd);
-		strbuf_rtrim(sb_contents);
+		} else if (ret == refname)
+			break;
+		else if (ret != continue_normal_ref)
+			return ret;
 
 		/*
 		 * Is it a symbolic ref?
@@ -1547,12 +1575,7 @@ static const char *resolve_ref_1(const char *refname,
 				errno = EINVAL;
 				return NULL;
 			}
-			if (bad_name) {
-				hashclr(sha1);
-				if (flags)
-					*flags |= REF_ISBROKEN;
-			}
-			return refname;
+			break;
 		}
 		if (flags)
 			*flags |= REF_ISSYMREF;
@@ -1578,6 +1601,13 @@ static const char *resolve_ref_1(const char *refname,
 			bad_name = 1;
 		}
 	}
+
+	if (bad_name) {
+		hashclr(sha1);
+		if (flags)
+			*flags |= REF_ISBROKEN;
+	}
+	return refname;
 }
 
 static const char *files_resolve_ref_unsafe(const char *refname,

After this resolve_ref_1() is backend independent. So we can make it
take parse_ref() as a function pointer instead.

commit 50d96b6f79b30b5ba17fa00ec3ee42845546a123
Author: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
Date:   Sat Feb 20 09:22:03 2016 +0700

    refs/files-backend.c: let resolve_refs_1() accept parse_ref as callback

diff --git a/refs/files-backend.c b/refs/files-backend.c
index f54f2ae..9b4de9f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1424,7 +1424,8 @@ static const char *parse_ref(const char *refname,
 			     unsigned char *sha1,
 			     int *flags,
 			     struct strbuf *sb_path,
-			     struct strbuf *sb_contents)
+			     struct strbuf *sb_contents,
+			     void *cb_data)
 {
 	const char *path;
 	struct stat st;
@@ -1497,13 +1498,21 @@ stat_ref:
 }
 
 /* This function needs to return a meaningful errno on failure */
-static const char *resolve_ref_1(const char *refname,
-				 int resolve_flags,
-				 unsigned char *sha1,
-				 int *flags,
-				 struct strbuf *sb_refname,
-				 struct strbuf *sb_path,
-				 struct strbuf *sb_contents)
+const char *resolve_ref_1(const char *refname,
+			  int resolve_flags,
+			  unsigned char *sha1,
+			  int *flags,
+			  struct strbuf *sb_refname,
+			  struct strbuf *sb_path,
+			  struct strbuf *sb_contents,
+			  const char *(*parse_ref)(const char *refname,
+						   int resolve_flags,
+						   unsigned char *sha1,
+						   int *flags,
+						   struct strbuf *sb_path,
+						   struct strbuf *sb_contents,
+						   void *cb_data),
+			  void *cb_data)
 {
 	int depth = MAXDEPTH;
 	int bad_name = 0;
@@ -1540,7 +1549,8 @@ static const char *resolve_ref_1(const char *refname,
 		}
 
 		ret = parse_ref(refname, resolve_flags, sha1,
-				flags, sb_path, sb_contents);
+				flags, sb_path, sb_contents,
+				cb_data);
 
 		if (ret == continue_symlink) {
 			if (starts_with(sb_contents->buf, "refs/") &&
@@ -1621,7 +1631,8 @@ static const char *files_resolve_ref_unsafe(const char *refname,
 	const char *ret;
 
 	ret = resolve_ref_1(refname, resolve_flags, sha1, flags,
-			    &sb_refname, &sb_path, &sb_contents);
+			    &sb_refname, &sb_path, &sb_contents,
+			    parse_ref, NULL);
 	strbuf_release(&sb_path);
 	strbuf_release(&sb_contents);
 	return ret;

And finally we can make lmdb use resolve_ref_1(). lmdb-specific code
is in the new retrieve_ref() function.

commit 62a5df3117c0f825bc26fd09dda29e713f94d743 (HEAD -> lmdb)
Author: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
Date:   Sat Feb 20 09:33:01 2016 +0700

    refs/lmdb-backend: use resolve_ref_1()

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9b4de9f..44b7136 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1407,7 +1407,7 @@ static int resolve_missing_loose_ref(const char *refname,
 	}
 }
 
-static const char *continue_normal_ref = "read_ref returns a normal ref";
+const char *continue_normal_ref = "read_ref returns a normal ref";
 static const char *continue_symlink = "read_ref returns a symlink";
 
 /*
diff --git a/refs/lmdb-backend.c b/refs/lmdb-backend.c
index 6c0d7fb..7f169bd 100644
--- a/refs/lmdb-backend.c
+++ b/refs/lmdb-backend.c
@@ -544,74 +544,64 @@ done:
 	return ret;
 }
 
+extern const char *continue_normal_ref;
+
+static const char *retrieve_ref(const char *refname,
+				int resolve_flags,
+				unsigned char *sha1,
+				int *flags,
+				struct strbuf *sb_path,
+				struct strbuf *sb_contents,
+				void *cb_data)
+{
+	struct lmdb_transaction *transaction = cb_data;
+	MDB_val key, val;
+	int needs_free;		/* dont care, leak */
+
+	key.mv_data = (void *)refname;
+	key.mv_size = strlen(refname) + 1;
+
+	val.mv_data = NULL;
+	val.mv_size = 0;
+
+	if (mdb_get_or_die(transaction, &key, &val, &needs_free)) {
+		struct strbuf err = STRBUF_INIT;
+
+		if (resolve_flags & RESOLVE_REF_READING)
+			return NULL;
+
+		if (verify_refname_available_txn(transaction,
+						 refname, NULL, NULL, &err)) {
+			error("%s", err.buf);
+			strbuf_release(&err);
+			return NULL;
+		}
+
+		hashclr(sha1);
+		return refname;
+	}
+
+	strbuf_reset(sb_contents);
+	strbuf_add(sb_contents, val.mv_data, val.mv_size);
+	return continue_normal_ref;
+}
+
 static const char *resolve_ref_unsafe_txn(struct lmdb_transaction *transaction,
 					  const char *refname,
 					  int resolve_flags,
 					  unsigned char *sha1,
 					  int *flags)
 {
-	int bad_name = 0;
-	char *ref_data;
-	struct MDB_val key, val;
-	struct strbuf err = STRBUF_INIT;
-	int needs_free = 0;
+	static struct strbuf sb_refname = STRBUF_INIT;
+	struct strbuf sb_contents = STRBUF_INIT;
+	struct strbuf sb_path = STRBUF_INIT;
 	const char *ret;
 
-	val.mv_size = 0;
-	val.mv_data = NULL;
-
-	if (flags)
-		*flags = 0;
-
-	if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
-		if (flags)
-			*flags |= REF_BAD_NAME;
-
-		if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
-		    !refname_is_safe(refname)) {
-			errno = EINVAL;
-			return NULL;
-		}
-		/*
-		 * dwim_ref() uses REF_ISBROKEN to distinguish between
-		 * missing refs and refs that were present but invalid,
-		 * to complain about the latter to stderr.
-		 *
-		 * We don't know whether the ref exists, so don't set
-		 * REF_ISBROKEN yet.
-		 */
-		bad_name = 1;
-	}
-
-	key.mv_data = (void *)refname;
-	key.mv_size = strlen(refname) + 1;
-	if (mdb_get_or_die(transaction, &key, &val, &needs_free)) {
-		if (bad_name) {
-			hashclr(sha1);
-			if (flags)
-				*flags |= REF_ISBROKEN;
-		}
-
-		if (resolve_flags & RESOLVE_REF_READING)
-			return NULL;
-
-		if (verify_refname_available_txn(transaction, refname, NULL, NULL, &err)) {
-			error("%s", err.buf);
-			strbuf_release(&err);
-			return NULL;
-		}
-
-		hashclr(sha1);
-		return refname;
-	}
-
-	ref_data = val.mv_data;
-	assert(ref_data[val.mv_size - 1] == 0);
-
-	ret = parse_ref_data(transaction, refname, ref_data, sha1,
-			     resolve_flags, flags, bad_name);
-	if (needs_free)
-		free(ref_data);
+	ret = resolve_ref_1(refname, resolve_flags, sha1, flags,
+			    &sb_refname, &sb_path, &sb_contents,
+			    retrieve_ref, transaction);
+	strbuf_release(&sb_path);
+	strbuf_release(&sb_contents);
 	return ret;
 } 

lmdb-backend.c:retrieve_ref(), files-backend.c:parse_ref() can be made
part of ref api that, given a ref name, returns the ref raw data and
type. The frontend can decide what backend callback to use based on
refname, so retrieve_ref() in the end does not have to call
read_per_worktree_ref() internally anymore.

Hmm?
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]