"Anand Kumria" <wildfire@xxxxxxxxxxx> writes: > I did an initial clone of Linus' linux-2.6.git tree, via the git protocol, > and then managed to accidently delete one of the .pack and > corresponding .idx files. > > I thought that 'cg-fetch' would do the job of bring down the missing pack > again, and all would be well. Alas this isn't the case. > > <http://pastebin.ca/246678> > > Pasky, on IRC, indicated that this might be because git-fetch-pack isn't > downloading missing objects when the git:// protocol is being used. There are the invariants between refs and objects: - objects that its refs (files under .git/refs/ hierarchy that record 40-byte hexadecimal object names) point at are never missing, or the repository is corrupt. - objects that are reachable via pointers in another object that is not missing (a tag points at another object, a commit points at its tree and its parent commits, and a tree points at its subtrees and blobs) are never missing, or the repository is corrupt. Git tools first fetch missing objects and then update your refs only when fetch succeeds completely, in order to maintain the above invariants (a partial fetch does not update your refs). And these invariants are why: - fsck-objects start reachability check from the refs; - commit walkers can stop at your existing refs; - git native protocols only need to tell the other end what refs you have, in order for the other end to exclude what you already have from the set of objects it sends you. What's missing needs to be determined in a reasonably efficient manner, and the above invariants allow us not have to do the equivalent of fsck-objects every time. Being able to trust refs is fairly fundamental in the fetch operation of git. I am not opposed to the idea of a new tool to fix a corrupted repository that has broken the above invariants, perhaps caused by accidental removal of objects and packs by end users. What it needs to do would be: - run fsck-objects to notice what are missing, by noting "broken link from foo to bar" output messages. Object 'bar' is what you _ought_ to have according to your refs but you don't (because you removed the objects that should be there), and everything that is reachable from it from the other side needs to be retrieved. Because you do not have 'bar', your end cannot determine what other objects you happen to have in your object store are reachable from it and would result in redundant download. - run fetch-pack equivalent to get everything reachable starting at the above missing objects, pretending you do not have any object, because your refs are not trustworthy. - run fsck-objects again to make sure that your refs can now be trusted again. To implement the second step above, you need to implement a modified fetch-pack that does not trust any of your refs. It also needs to ignore what are offered from the other end but asks the objects you know are missing ('bar' in the above example). This program needs to talk to a modified upload-pack running at the other end (let's call it upload-pack-recover), because usual upload-pack does not serve starting from a random object that happen to be in its repository, but only starting from objects that are pointed by its own set of refs to ensure integrity. The upload-pack-recover program would need to start traversal from object 'bar' in the above example, and when it does so, it should not just run 'rev-list --objects' starting at 'bar'. It first needs to prove that its object store has everything that is reachable from 'bar' (the recipient would still end up with an incomplete repository if it didn't). What this means is that it needs to prove some of its refs can reach 'bar' (again, on the upstream end, only refs are trusted, not mere existence of object is not enough) before sending objects back. Usual upload-pack do not have to do it because it refuses to serve starting from anything but what its refs point at (and by the invariants, the objects pointed at by refs are guaranteed to be complete [an object is "complete" if no object that can be reachable is not missing]). This is needed because the repository might have discarded branch that used to reach 'bar', and while the object 'bar' was in a pack but some of its ancestors or component trees and/or blobs were loose and subsequent git-prune have removed the latter without removing 'bar'. Mere existence of the object 'bar' does not mean 'bar' is complete. So coming up with such a pair of programs is not a rocket science, but it is fairly delicate. I would rather have them as specialized commands, not a part of everyday commands, even if you were to implement it. Since this is not everyday anyway, a far easier way would be to clone-pack from the upstream into a new repository, take the pack you downloaded from that new repository and mv it into your corrupt repository. You can run fsck-objects to see if you got back everything you lost earlier. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html