[+cc spearce, as I think this is a bug in code.google.com's sending side, and he can probably get the attention of the right folks] On Sat, Aug 23, 2014 at 12:12:08PM +0200, René Scharfe wrote: > Git 1.7.6 clones the repo without reporting an error or warning, but a > check shows that a tree object is missing: Thanks for digging on this. I happened to be looking at it at the exact same time, so now I can stop. :) > The patch below, which makes index-pack ignore objects with unexpected > real_type as before, changes the shown error message, but clone still > fails: > > $ git clone https://code.google.com/p/mapsforge/ > Cloning into 'mapsforge'... > remote: Counting objects: 12879, done. > Receiving objects: 100% (12879/12879), 10.19 MiB | 2.43 MiB/s, done. > Resolving deltas: 100% (4348/4348), done. > fatal: did not receive expected object a0155d8d5bb58afb9a5d20459be023899c9a1cef > fatal: index-pack failed This makes sense. Early versions of index-pack didn't know how to check for missing objects, but now it does. However, we only trigger that code if we are using --strict, or if we are cloning (in which case we pass the --check-self-contained-and-connected option). If you are doing a regular fetch, we rely on check_everything_connected after the pack is received. So (with your patch): $ git init $ git fetch https://code.google.com/p/mapsforge/ remote: Counting objects: 12298, done. Receiving objects: 100% (12298/12298), 9.24 MiB | 959.00 KiB/s, done. Resolving deltas: 100% (4107/4107), done. error: Could not read a0155d8d5bb58afb9a5d20459be023899c9a1cef fatal: bad tree object a0155d8d5bb58afb9a5d20459be023899c9a1cef error: https://code.google.com/p/mapsforge/ did not send all necessary objects That all makes sense. > Turning that fatal error into a warning get us a bit further: > > $ git clone https://code.google.com/p/mapsforge/ > Cloning into 'mapsforge'... > remote: Counting objects: 12879, done. > Receiving objects: 100% (12879/12879), 10.19 MiB | 2.38 MiB/s, done. > Resolving deltas: 100% (4350/4350), done. > warning: did not receive expected object a0155d8d5bb58afb9a5d20459be023899c9a1cef > fatal: The same object bc386be34bd4781f71bb68d72a6e0aee3124201e appears twice in the pack > fatal: index-pack failed > > Disabling strict checking (WRITE_IDX_STRICT) as well lets clone > succeed, but a check shows that a tree is missing, as wiht git 1.7.6: Interesting that one object is duplicated and another object is missing. The pack doesn't contain the sha1s of the objects; we compute them on the fly in index-pack. So it's likely that the real problem is that one entry in the pack either has the wrong delta data, or mentions the wrong base object. And does it in such a way that we generate the another object that does exist (so the packfile data isn't garbled or corrupted; it's a bug on the sending side that uses the wrong data). > https://github.com/applantation/mapsforge has that missing tree > object, by the way. Unsurprisingly, it's a tree object quite similar to the duplicated one. > OK, what now? It's better to show an error message instead of > failing an assertion if a repo is found to be corrupt because the > issue is caused by external input. I don't know if the patch > below may have any bad side effects, though, so no sign-off at > this time. Definitely. Your patch looks like an improvement. The objects in the delta list must have had their real_types set to REF_DELTA and OFS_DELTA at some point (since that is how they got on the list). And there is no way for the type field to change from a delta type to anything else _except_ by us resolving the delta. So I think the condition triggering that assert cannot mean anything except that we have a duplicate object (and it is not actually the delta object that is duplicated, but rather its base, as seeing it again is what causes us to try to resolve twice). So we know at this point that we have a duplicate object, and we could warn or say something. But I think we probably would not want to. If --strict is in effect, then we will notice and complain later. And if it is not, then we should allow it without comment (in this case the repo is broken, but it would not _have_ to be). So I think your patch is doing the right thing. > Allowing git to clone a broken repo sounds useful, up to point. > In this particular case older versions had no problem doing that, > so it seems worth supporting. I think with your patch we are fine. Without --strict (which is implied on a clone), you can still "git init" and "git fetch" the broken pack (fetch will complain, but it leaves the pack in place). We may want to adjust the fact that --check-self-contained-and-connected implies strict (it builds on the fsck bits of index-pack, but it does not have to imply a full fsck, nor strict index writing). > And how did the tree object went missing in the first place? My guess is a bug on the sending side. We have seen duplicate pack objects before, but never (AFAIK) combined with a missing object. This really seems like the sender just sent the wrong data for one object. IIRC, code.google.com is backed by their custom Dulwich implementation which runs on BigTable. We'll see if Shawn can get this to the right people as a bug report. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html