Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote:

> The merge is created in a temporary location that uses alternates. The
> temporary repository is on a local disk, the alternate object database
> on an NFS mount.

Is the alternate writeable? If we can't freshen the object, we fall back
to storing the object locally, which could have a performance impact.
But it looks from your tables below like the utime() call is succeeding,
so that is probably not what is happening here.

> My current hypothesis is that the additional `access`, but more
> importantly the additional `utime` calls are responsible in the
> increased merge times that we see.

Yeah, that makes sense from your tables. The commit in question flips
the order of the loose/packed check, and the packed check should be much
faster on your NFS mount. Can you try:

diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
 	if (returnsha1)
 		hashcpy(returnsha1, sha1);
-	if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }

I think that should clear up the access() calls, but leave the utime()
ones.

> Looking at the detailed strace shows that utime will be called
> repeatedly in same cases (e.g.
> https://bitbucket.org/snippets/ssaasen/oend shows an example where the
> same packfile will be updated more than 4000 times in a single merge).
> 
> http://www.spinics.net/lists/git/msg240106.html discusses a potential
> improvement for this case. Would that be an acceptable avenue to
> improve this situation?

I think so. Here's a tentative patch:

diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
 	int pack_fd;
 	unsigned pack_local:1,
 		 pack_keep:1,
+		 freshened:1,
 		 do_not_close:1;
 	unsigned char sha1[20];
 	/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..f27cbf1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1)
 static int freshen_packed_object(const unsigned char *sha1)
 {
 	struct pack_entry e;
-	return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+	if (!find_pack_entry(sha1, &e))
+		return 0;
+	if (e.p->freshened)
+		return 1;
+	return freshen_file(e.p->pack_name);
 }
 
 int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)


If it's not a problem, I'd love to see timings for your case with just
the first patch, and then with both.

You may also be interested in:

  http://thread.gmane.org/gmane.comp.version-control.git/266370

which addresses another performance problem related to the
freshen/recent code in v2.2.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]