On Thu, Jan 01, 2009 at 11:15:19PM -0800, Shawn O. Pearce wrote: > > OK, I wish you luck in the fruition of the new --dump-delta option, and > > can proofread the man pages involved, otherwise this is no area for > > junior programmer me. > > This is rather insane. There's very little data inside of a delta. > That's sort of the point of that level of compression, it takes > up very little disk space and yet describes the change made. > Almost nobody is going to want the delta without the base object > it applies onto. No user of git is going to need that. I'd rather > not carry dead code around in the tree for something nobody will > ever use. I somewhat agree. Obviously we can come up with contrived cases where the delta is a pure "add" and this option magically lets you recover some text via "strings" on the resulting delta dump. But in practice, it's hard to say exactly how useful it would be, especially since the "motivation" here seems to be more academic than any actual real-world problem. We can approximate with something like: git clone git://git.kernel.org/pub/scm/git/git.git cd git git bundle create ../bundle.git v1.6.0..v1.6.1 mkdir ../broken && cd ../broken sed '/^PACK/,$!d' ../bundle.git >pack git init git unpack-objects --dump-deltas <pack strings .git/lost-found/delta/* | less where maybe you lost your actual repository, but you still have a backup of a bundle you sneaker-netted between major versions. In this instance we have 6000 objects in the bundle, 2681 of which are blobs (and therefore presumably the most interesting things to recover). Of those, 1070 were non-delta and can be recovered completely. For the remainder, our strings command shows us snippets of what was there. There are definitely recognizable pieces of code. But likewise there are pieces of code that are missing subtle parts. E.g.: if (textconv_one) { size_t size; mf1.ptr = run_textconv(textconv_one, one, &size); if (!mf1. ptr) mf1.size = size; if (textconv_two) { size_t size; mf2.ptr = run_textconv(textconv_two, two, &size); if (!mf2. ptr) mf2.size = size; So while there is _something_ to be recovered there, it is basically as easy to rewrite the code as it is to piece together whatever fragments are available into something comprehensible. So in practice, the delta dump would only be useful if: 1. You have an incomplete thin pack, which generally means you are using bundles (or you interrupted a fetch and kept the tmp_pack). 2. There is _no_ other copy of the basis. The results you get from this method are so awful that it should really only be last-ditch. I think you would be insane to say "Oh, I don't have net access right now. Let me just spend hours picking through these deltas to find a scrap of something useful instead of just waiting until I get access again." 3. The changes in the pack tend to produce deltas rather than full blobs, but the deltas tend to be very add-heavy. I don't know how popular bundles are, but I would expect (1) puts us very much in the minority. On top of that, given the nature of git, I find (2) to be pretty unlikely. If you're sneaker-netting data with a bundle, then it seems rare that both ends of the net will be lost at once. As for (3), it seems source code is not a good candidate here. Perhaps if you were writing a novel in a single file, you might salvage whole paragraphs or even chapters. So I am inclined to leave it as-is: a patch in the list archive. If and when the day comes when somebody loses some super-important data and somehow matches all of these criteria, then they can consult whatever aged and senile git gurus still exist to pull the patch out and see if anything can be recovered. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html