Re: [PATCH 0/6] address packed-refs speed regressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 05.04.2015 um 03:06 schrieb Jeff King:
> As I've mentioned before, I have some repositories with rather large
> numbers of refs. The worst one has ~13 million refs, for a 1.6GB
> packed-refs file. So I was saddened by this:
> 
>    $ time git.v2.0.0 rev-parse refs/heads/foo >/dev/null 2>&1
>    real    0m6.840s
>    user    0m6.404s
>    sys     0m0.440s
> 
>    $ time git.v2.4.0-rc1 rev-parse refs/heads/foo >/dev/null 2>&1
>    real    0m19.432s
>    user    0m18.996s
>    sys     0m0.456s
> 
> The command isn't important; what I'm really measuring is loading the
> packed-refs file. And yes, of course this repository is absolutely
> ridiculous. But the slowdowns here are linear with the number of refs.
> So _every_ git command got a little bit slower, even in less crazy
> repositories. We just didn't notice it as much.
> 
> Here are the numbers after this series:
> 
>    real    0m8.539s
>    user    0m8.052s
>    sys     0m0.496s
> 
> Much better, but I'm frustrated that they are still 20% slower than the
> original.
> 
> The main culprits seem to be d0f810f (which introduced some extra
> expensive code for each ref) and my 10c497a, which switched from fgets()
> to strbuf_getwholeline. It turns out that strbuf_getwholeline is really
> slow.

10c497a changed read_packed_refs(), which reads *all* packed refs.
Each is checked for validity.  That sounds expensive if the goal is
just to look up a single (non-existing) ref.

Would it help to defer any checks until a ref is actually accessed?
Can a binary search be used instead of reading the whole file?

I wonder if pluggable reference backends could help here.  Storing refs
in a database table indexed by refname should simplify things.

Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping
the packed refs file?  What numbers do you get with the following patch?

---
 refs.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/refs.c b/refs.c
index 47e4e53..144255f 100644
--- a/refs.c
+++ b/refs.c
@@ -1153,16 +1153,35 @@ static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1)
  *      compatibility with older clients, but we do not require it
  *      (i.e., "peeled" is a no-op if "fully-peeled" is set).
  */
-static void read_packed_refs(FILE *f, struct ref_dir *dir)
+static void read_packed_refs(int fd, struct ref_dir *dir)
 {
 	struct ref_entry *last = NULL;
 	struct strbuf line = STRBUF_INIT;
 	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
+	struct stat st;
+	void *map;
+	size_t mapsz, len;
+	const char *p;
+
+	fstat(fd, &st);
+	mapsz = xsize_t(st.st_size);
+	if (!mapsz)
+		return;
+	map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0);
 
-	while (strbuf_getwholeline(&line, f, '\n') != EOF) {
+	for (p = map, len = mapsz; len; ) {
 		unsigned char sha1[20];
 		const char *refname;
 		const char *traits;
+		const char *nl;
+		size_t linelen;
+
+		nl = memchr(p, '\n', len);
+		linelen = nl ? nl - p + 1 : len;
+		strbuf_reset(&line);
+		strbuf_add(&line, p, linelen);
+		p += linelen;
+		len -= linelen;
 
 		if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
 			if (strstr(traits, " fully-peeled "))
@@ -1204,6 +1223,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 	}
 
 	strbuf_release(&line);
+	munmap(map, mapsz);
 }
 
 /*
@@ -1224,16 +1244,16 @@ static struct packed_ref_cache *get_packed_ref_cache(struct ref_cache *refs)
 		clear_packed_ref_cache(refs);
 
 	if (!refs->packed) {
-		FILE *f;
+		int fd;
 
 		refs->packed = xcalloc(1, sizeof(*refs->packed));
 		acquire_packed_ref_cache(refs->packed);
 		refs->packed->root = create_dir_entry(refs, "", 0, 0);
-		f = fopen(packed_refs_file, "r");
-		if (f) {
-			stat_validity_update(&refs->packed->validity, fileno(f));
-			read_packed_refs(f, get_ref_dir(refs->packed->root));
-			fclose(f);
+		fd = open(packed_refs_file, O_RDONLY);
+		if (fd >= 0) {
+			stat_validity_update(&refs->packed->validity, fd);
+			read_packed_refs(fd, get_ref_dir(refs->packed->root));
+			close(fd);
 		}
 	}
 	return refs->packed;
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]