Michael Haggerty <mhagger@xxxxxxxxxxxx> writes: > It's not only racy WRT other processes. If the current git process > would create a new reference, it wouldn't be reflected in the cache. > > It's true that the main ref_cache doesn't invalidate itself > automatically either when a new reference is created, so it's not really > a fair complaint. However, as we add places where the cache is > invalidated, it is easy to overlook this cache that is stuck in static > variables within a function definition and it is impossible to > invalidate it. Might it not be better to attach the cache to the > ref_cache structure instead, and couple its lifetime to that object? > > Alternatively, the cache could be created and managed on the caller > side, since the caller would know when the cache would have to be > invalidated. Also, different callers are likely to have different > performance characteristics. It is unlikely that the time to initialize > the cache will be amortized in most cases; in fact, "rev-list --stdin" > might be the *only* plausible use case. True. > Regarding the overall strategy: you gather all refnames that could be > confused with an SHA-1 into a sha1_array, then later look up SHA-1s in > the array to see if they are ambiguous. This is a very special-case > optimization for SHA-1s. > > I wonder whether another approach would gain almost the same amount of > performance but be more general. We could change dwim_ref() (or a > version of it?) to read its data out of a ref_cache instead of going to > disk every time. Then, at the cost of populating the relevant parts of > the ref_cache once, we would have fast dwim_ref() calls for all strings. If opendir-readdir to grab only the names (but not values) of many refs is a lot faster than stat-open-read a handful of dwim-ref locations for a given name, that optimization might be worthwhile, but I think that requires an update to read_loose_refs() not to read_ref_full() and the users of refs API to instead lazily resolve the refs, no? If I ask for five names (say 'maint', 'master', 'next', 'pu', 'jch'), the current code will do 5 dwim_ref()s, each of which will consult 6 locations with resolve_ref_unsafe(), totalling 30 calls to resolve_ref_unsafe(), each of which in turn is essentially an open followed by either an return on ENOENT or a read. So 30 opens and 5 reads in total. With your lazy ref_cache scheme, instead we would enumerate all the loose ones in the same 6 directories (e.g. refs/tags/, refs/heads), so 6 opendir()s with as many readdir()s as I have loose refs, plus we open-read them in read_loose_refs() called from get_ref_dir() with the current ref_cache code. For me, "find .git/refs/heads" gives 500+ lines of output, which suggests that using the ref_cache mechanism for dwim_ref() may not be a huge win, unless it is updated to be extremely lazy, and readdir()s turns out to be extremely less heavier than open-read. Also it is unlikely that the cost to initialize the cache is amortized to be a net win unless we are dealing with tons of dwim_ref()s. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html