On Sat, 17 Mar 2007, Junio C Hamano wrote: > When unpacking a depth-3 deltified object A, the code finds the > target object A (which is a delta), ask for its base B and put B > in the cache after using it to reconstitute A. While doing so, > the first-generation base B is also a delta so its base C (which > is a non-delta) is found and placed in the cache. When A is > returned, the cache has B and C. If you ask for B at this > point, we read the delta, pick up its base C from the cache, > apply, and return while putting C back in the cache. If you ask > for A after that, we do not read from the cache, although it is > available. > > Which feels a bit wasteful at first sight, and we *could* make > read_packed_sha1() also steal from the cache, but after thinking > about it a bit, I am not sure if it is worth it. The contract > between read_packed_sha1() and read_sha1_file() and its callers > is that the returned data belongs to the caller and it is a > responsibility for the caller to free the buffer, and also the > caller is free to modify it, so stealing from the cache from > that codepath means an extra allocation and memcpy. So? A malloc() + memcpy() will always be faster than mmap() + malloc() + inflate(). If the data is already there it is certainly better to copy it straight away. With the patch below I can do 'git log drivers/scsi/ > /dev/null' about 7% faster. I bet it might be even more on those platforms with bad mmap() support. Signed-off-by: Nicolas Pitre <nico@xxxxxxx> --- diff --git a/sha1_file.c b/sha1_file.c index a7e3a2a..ee64865 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1372,7 +1372,7 @@ static unsigned long pack_entry_hash(struct packed_git *p, off_t base_offset) } static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset, - unsigned long *base_size, enum object_type *type) + unsigned long *base_size, enum object_type *type, int keep_cache) { void *ret; unsigned long hash = pack_entry_hash(p, base_offset); @@ -1384,7 +1384,13 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset, return unpack_entry(p, base_offset, type, base_size); found_cache_entry: - ent->data = NULL; + if (!keep_cache) + ent->data = NULL; + else { + ret = xmalloc(ent->size + 1); + memcpy(ret, ent->data, ent->size); + ((char *)ret)[ent->size] = 0; + } *type = ent->type; *base_size = ent->size; return ret; @@ -1418,7 +1424,7 @@ static void *unpack_delta_entry(struct packed_git *p, off_t base_offset; base_offset = get_delta_base(p, w_curs, &curpos, *type, obj_offset); - base = cache_or_unpack_entry(p, base_offset, &base_size, type); + base = cache_or_unpack_entry(p, base_offset, &base_size, type, 0); if (!base) die("failed to read delta base object" " at %"PRIuMAX" from %s", @@ -1615,7 +1621,7 @@ static void *read_packed_sha1(const unsigned char *sha1, if (!find_pack_entry(sha1, &e, NULL)) return NULL; else - return unpack_entry(e.p, e.offset, type, size); + return cache_or_unpack_entry(e.p, e.offset, size, type, 1); } /* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html