Re: [PATCH] lookup_object: prioritize recently found objects

Jeff King <peff@xxxxxxxx> · Fri, 3 May 2013 02:16:05 -0400

On Fri, May 03, 2013 at 02:02:44AM -0400, Jeff King wrote:

> > Can we be sure that the function is never invoked in concurrently from
> > different threads? I attempted to audit code paths, but quickly gave up
> > because I know too little about this machinery.
> 
> I didn't check explicitly, but in general such a program would already
> need a mutex to synchronize object lookup. Not for lookup_object
> specifically, but because lookup_object is mostly used to back
> lookup_commit, lookup_tree, etc, which are already not re-entrant
> (because they lazily insert into the hash behind the scenes).

I just looked through the 25 existing calls to lookup_object. All of
them should be OK. Most of them are coupled with immediate calls to
update the hash table and/or to call read_sha1_file (which is also very
not-thread-safe). So I don't think the patch introduces any bug into the
current code.

It does introduce a potential for future bugs in concurrent code, but I
don't know that it makes a significant dent in the current codebase. We
already have a lot of non-reentrant functions, including all of the
other lookup_* functions.  Our concurrent code is typically very careful
to use a small known-safe subset of functions.

I did look originally at updating the ordering when the hash is already
being updated (i.e., to insert a new entry at the front of a collision
chain, rather than the back). However, that didn't yield nearly as much
improvement. I think partially because the locality of an insert/lookup
pair is not as good as a lookup/lookup pair. And partially because we
destroy that ordering for existing entries whenever we grow the hash
table.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html