On Fri, Apr 14, 2017 at 07:25:54PM +0000, git@xxxxxxxxxxxxxxxxx wrote: > for (i = 0; i < n; i++, dirmask >>= 1) { > - const unsigned char *sha1 = NULL; > - if (dirmask & 1) > - sha1 = names[i].oid->hash; > - buf[i] = fill_tree_descriptor(t+i, sha1); > + if (i > 0 && are_same_oid(&names[i], &names[i - 1])) > + t[i] = t[i - 1]; > + else if (i > 1 && are_same_oid(&names[i], &names[i - 2])) > + t[i] = t[i - 2]; > + else { > + const unsigned char *sha1 = NULL; > + if (dirmask & 1) > + sha1 = names[i].oid->hash; > + buf[nr_buf++] = fill_tree_descriptor(t+i, sha1); > + } This looks fine to me. Just musing (and I do not think we need to go further than your patch), we're slowly walking towards an actual object-content cache. The "buf" array is now essentially a cache of all oids we've loaded, but it doesn't know its sha1s. So we could actually encapsulate all of the caching: struct object_cache { int nr_entries; struct object_cache_entry { struct object_id oid; void *data; } cache[MAX_UNPACK_TREES]; }; and then ask it "have you seen oid X" rather than playing games with looking at "i - 1". Of course it would have to do a linear search, so the next step is to replace its array with a hashmap. And now suddenly we have a reusable object-content cache that you could use like: struct object_cache = {0}; for (...) { /* maybe reads fresh, or maybe gets it from the cache */ void *data = read_object_data_cached(&oid, &cache); } /* operation done, release the cache */ clear_object_cache(&cache); which would work anywhere you expect to load N objects and see some overlap. Of course it would be nicer still if this all just happened automatically behind the scenes of read_object_data(). But it would have to keep an _extra_ copy of each object, since the caller expects to be able to free it. We'd probably have to return instead a struct with buffer/size in it along with a reference counter. I don't think any of that is worth it unless there are spots where we really expect there to be a lot of cases where we hit the same objects in rapid succession. I don't think there should be, though. Our usual "caching" mechanism is to create a "struct object", which is enough to perform most operations (and has a much smaller memory footprint). So again, just musing. I think your patch is fine as-is. -Peff