Re: [PATCH] read-cache.c: optimize reading index format v4

Duy Nguyen <pclouds@xxxxxxxxx> · Tue, 4 Sep 2018 18:08:40 +0200

On Mon, Aug 27, 2018 at 12:36:27PM -0700, Junio C Hamano wrote:
> > PS. I notice that v4 does not pad to align entries at 4 byte boundary
> > like v2/v3. This could cause a slight slow down on x86 and segfault on
> > some other platforms.
> 
> Care to elaborate?  
> 
> Long time ago, we used to mmap and read directly from the index file
> contents, requiring either an unaligned read or padded entries.  But
> that was eons ago and we first read and convert from on-disk using
> get_be32() etc. to in-core structure, so I am not sure what you mean
> by "segfault" here.

To conclude this unalignment thing (since I plan more changes in the
index to keep its size down, which may increase unaligned access), I
ran with the following patch on amd64 (still webkit.git, 275k files,
100 runs), the index version that does not make unaligned access does
not give noticeable differences. Still roughly around 4.2s.

Running with NO_UNALIGNED_LOADS defined is clearly slower, in 4.3s
range. So in theory if we avoid unaligned access in the index and
avoid slow get_beXX versions, we could bring performance back to 4.2s
range for those platforms.

But on the other hand, padding the index increases the index size by
~1MB (v4 version before padding is 21MB) and this may add more cost at
update time because of the trailer hash.

So, yeah it's probably ok to keep living with unaligned access and not
pad more. At least until those on "no unaligned access" platforms yell
up.

diff --git a/read-cache.c b/read-cache.c
index 8628d0f3a8..33ee35fb81 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1794,7 +1794,7 @@ static struct cache_entry *create_from_disk(struct index_state *istate,
 		if (copy_len)
 			memcpy(ce->name, previous_ce->name, copy_len);
 		memcpy(ce->name + copy_len, name, len + 1 - copy_len);
-		*ent_size = (name - ((char *)ondisk)) + len + 1 - copy_len;
+		*ent_size = ((name - ((char *)ondisk)) + len - copy_len + 8) & ~7;
 	} else {
 		memcpy(ce->name, name, len + 1);
 		*ent_size = ondisk_ce_size(ce);
@@ -2345,8 +2345,10 @@ static int ce_write_entry(git_hash_ctx *c, int fd, struct cache_entry *ce,
 			result = ce_write(c, fd, to_remove_vi, prefix_size);
 		if (!result)
 			result = ce_write(c, fd, ce->name + common, ce_namelen(ce) - common);
-		if (!result)
-			result = ce_write(c, fd, padding, 1);
+		if (!result) {
+			int len = prefix_size + ce_namelen(ce) - common;
+			result = ce_write(c, fd, padding, align_padding_size(size, len));
+		}
 
 		strbuf_splice(previous_name, common, to_remove,
 			      ce->name + common, ce_namelen(ce) - common);

--
Duy