On Sat, Mar 21, 2020 at 08:23:13PM -0400, Derrick Stolee wrote: > On 3/21/2020 8:20 PM, Taylor Blau wrote: > > On Sat, Mar 21, 2020 at 08:03:01PM -0400, Derrick Stolee wrote: > >> On 3/21/2020 2:50 PM, Junio C Hamano wrote: > >>> Do we need to worry about INFO_QUICK and SKIP_FETCH_OBJECT in this > >>> codepath, by the way? > >> > >> I was coming back to this thread to bring up these exact flags for > >> consideration. The good news is that in a partial clone with any > >> amount of filtering we will still have all reachable commits, which > >> are necessary for the commit-graph to make sense. The only ones that > >> would fail has_object_file() are ones removed by GC, but they may > >> still exist on the remote. So without SKIP_FETCH_OBJECT, we would > >> generate a network call even if the server has GC'd to remove the > >> commits. This gets particularly bad when the server returns all > >> reachable objects from that commit! > > > > That makes sense. Do you think something like this should be applied? > > + int flags = OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT; > [...] > > Yes, I think this is the appropriate update. Thanks! I'm not so sure. In a non-partial clone, when would expect QUICK to matter? If we find the object is missing, then either: - we are racing with repack, and we would benefit from double-checking that the object really is gone - we used to have it (since it was mentioned in the graph file) but it went away Using QUICK means we won't waste time double-checking in the second case. But it means we won't catch the first case, and we may generate a new graph file that omits the object. They're both optimizations, and I don't think we're impacting correctness[1], but it's not clear to me whether one is a win over the other. We don't generally expect objects we have to go away often. Skipping fetching seems a little more straight-forward to me. If we had it in a graph file, presumably we had the actual object before, too. And either we're in the first case above (we really do have it and just need to double-check) in which case not saying QUICK would be enough. Or we intentionally got rid of it. In which case downloading it just to generate a cache is quite silly. So it seems like SKIP_FETCH_OBJECT but _not_ QUICK might be reasonable here. -Peff [1] I'm actually not quite sure about correctness here. It should be fine to generate a graph file without any given commit; readers will just have to load that commit the old-fashioned way. But at this phase of "commit-graph write", I think we'll already have done the close_reachable() check. What does it mean to throw away a commit at this stage? If we're the parent of another commit, then it will have trouble referring to us by a uint32_t. Will the actual writing phase barf, or will we generate an invalid graph file?