Also CC-ing Stolee since I mention multi-pack indices at the end. > This seems like a reasonable thing to do, but I have sort of a > meta-comment. In several places we've started doing this kind of "if > it's this type of object, do X, otherwise do Y" optimization (e.g., > handling large blobs for streaming). > > And in the many cases we end up doubling the effort to do object > lookups: here we do one lookup to get the type, and then if it's not a > commit (or if we don't have a commit graph) we end up parsing it anyway. > > I wonder if we could do better. In this instance, it might make sense > to first see if we actually have a commit graph available (it might not > have this object, of course, but at least we'd expect it to have most > commits). This makes sense - I thought I shouldn't mention the commit graph in the code since it seems like a layering violation, but I felt the need to mention commit graph in a comment, so maybe the need to mention commit graph in the code is there too. Subsequently, maybe the lookup-for-type could be replaced by a lookup-in-commit-graph (maybe by using parse_commit_in_graph() directly), which should be at least slightly faster. > In general, it would be nice if we had a more incremental API > for accessing objects: open, get metadata, then read the data. That > would make these kinds of optimizations "free". Would this be assuming that to read the data, you would (1) first need to read the metadata, and (2) there would be no redundancy in reading the two? It seems to me that for loose objects, you would want to perform all your reads at once, since any read requires opening the file, and for commit graphs, you just want to read what you want, since the metadata and the data are in separate places. > I don't have numbers for how much the extra lookups cost. The lookups > are probably dwarfed by parse_object() in general, so even if we save > only a few full object loads, it may be a win. It just seems a shame > that we may be making the "slow" paths (when our type-specific check > doesn't match) even slower. I agree. I think it will always remain a tradeoff when we have multiple data sources of objects (loose, packed, commit graph - and we can't unify them all, since they each have their uses). Unless the multi-pack index can reference commit graphs as well...then it could be our first point of reference without introducing any inefficiencies...