Junio C Hamano <gitster@xxxxxxxxx> wrote: > > I've been toying with an idea for an alternative solution, and need > somebody competent to bounce it around with. Heh, then we need to wait for Nico... :-) > pack-objects ends up doing eventually > > rev-list --objects $send1 $send2 $send3 ... --not $have1 $have2 ... > > which lists commits and associated objects reachable from $sendN, > excluding the ones that are reachable from $haveN. > > The tentative solution Björn Steinbrink and I came up with excludes > missing commit from $haveN to avoid rev-list machinery to barf, but it > violates the ref-object contract as I explained to Björn in my other > message. Oh, OK, I now _finally_ understand what you were trying to say by the reachability thing. I kept scratching my head trying to understand you, and was going to say something stupid on list; but waited because I just didn't get what the big deal was... Its the crash in rev-list that you were worried about. > Checking if each commit is reachable from any of the refs is quite > expensive, and it would especially be so if it is done once per ".have" > and real ref we receive from the other end. Yup. > An alternative is to realize that rev-list traversal already does > something quite similar to what is needed to prove if these ".have"s are > reachable from refs when listing the reachable objects. This computation > is what it needs to do anyway, so if we teach rev-list to ignore missing > or broken chain while traversing negative refs, we do not have to incur > any overhead over existing code. EXACTLY. JGit does this. The functional equivilant of rev-list in JGit will by default throw an exception if any object is missing when we try to walk it. That includes things we've painted UNINTERESTING, as it is a sure sign of repository corruption. However; our equivilant of pack-objects can toggle what you are calling "ignore-missing-negative" when it starts enumeration. Any UNINTERESTING object which is missing or failed to parse is simply tossed aside. Yes, the pack may be larger than necessary like in Peff's example of: Q-R / D--E \ A-C If the other side has C reachable, we are pushing R, and we have C but are missing A, we'll "over push" D-E, but its still a clean and valid push. Its no worse than we were before the ".have" came about, or if C hadn't been downloaded locally at all. (Of course your tell-me-more extension would help fix this over-push, but lets not get off topic.) IMHO, this corruption of A is harmless if C isn't reachable. It isn't really local corruption unless C was reachable by a ref. But we don't tend to see much corruption like that, and if it did exist, it would show up during *other* operations that access a larger set of local refs, such as "git gc". > I have a mild suspicion that it may even be the right thing to ignore them > unconditionally, and it might even match the intention of Linus's original > code. That would make many hunks in this patch much simpler. I don't think its right to ignore broken UNINTERESTING chains all of the time. Today we would see fatal errors if I asked for git log R ^C and A was missing, but R and C are both local refs. I still want to see that fatal error. Its a local corruption that should be raised quickly to the user. In fact by A missing we'd compute the wrong result and produce D-E too, which is wrong. IMHO, the *only* time this missing uninteresting A is safe is during send-pack, upload-pack, or bundle creation, where you are bringing the other side up to R by transferring any amount of data necessary to reach that goal. Which is why JGit enables this. (Though at the API level we do let the caller flag if they want the error to be fatal instead, but AFAIK nobody sets it for "fatal".) FWIW, Linus' most recent message on this thread about hoisting the UNINTERESTING test up sooner makes sense too. > The evidences behind this suspicion are found in a handful of places in > revision.c. mark_blob_uninteresting() does not complain if the caller > fails to find the blob. mark_tree_uninteresting() does not, either. > mark_parents_uninteresting() does not, either, and it even has a comment > that strongly suggests the original intention was not to care about > missing UNINTERESTING objects. That feels wrong to me... given the "git log R ^C" example I give above. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html