On Sat, 27 Mar 2010, Scott Chacon wrote: > Hey, > > Sorry it's taken me a bit - I'm traveling right now. > > On Fri, Mar 26, 2010 at 6:56 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > >> > > Given that GitHub has blessed the world with this corruption, > >> > > we may need to modify JGit to accept it. > > Well, shouldn't it accept it just because CGit accepts it? Isn't that > an incompatibility in implementation? CGit fsck complains about it. This should be sufficient a clue to avoid such things. > >> But GitHub's approach here seems to be "Meh, its fine, don't worry > >> about it". > > That isn't really my approach, I actually thought I had fixed this a > while ago. It seems to be a pretty understandable mistake, since > ls-tree and cat-file -p both output zero padded modes and it is only > an issue on trees with subtrees, obviously, so we don't see it all the > time at GitHub. I have fixed this and it's in the queue for > deployment which should be in the next few days (I gotta get home > first). Thanks. > > It's up to GitHub to fork Git then, and while at it stop calling it Git > > compatible. Really. If we start to get slack about the pack format > > like this then every Git reimplementation du jour will make similar > > deviations except in different directions and we'll end up with a mess > > to support. > > Really? It's not the pack format - we use stock Git servers and > almost always have. It's the tree writing when someone edits a file > inline - I was writing out zero-padded trees. And, it _is_ Git > compatible - CGit only issues a warning, and that only if the > circumstances align such that we write a tree with a subtree, which > again is pretty rare. There are only a handful of projects like this > and in all CGit circumstances makes no practical difference. It is still damn important to those with an interest in pack format improvements that only one way of creating a tree object exists, especially as we stamp a SHA1 hash on it. Whatever we do with the tree encoding in the future, it is essential that the canonical expression of any tree object be unambiguous and always produce the same hash. > > My stance has always been that the C Git is authoritative with regards to > > formats and protocols. It's up to Github to fix their screw-up. > > It is fixed and will be deployed soon, but really, there is no reason > to be snippy. It is a simple and minor mistake effecting very few > repositories (maybe 100 out of 730k), and the only reason it's an > issue at all is that JGit is not following the authoritative CGit > implementation of basically ignoring it. But again CGit's fsck is not ignoring this discrepancy. And if the CGit core is otherwise silently accepting it then it is a mistake. > Also, if we're all concerned about "Git reimplementation du jour" > deviations, then we need to focus on libifying Git so there isn't a > need for such re-implementations. I'm hoping to help with a possible > GSoC project on libgit2, but the lack of a linkable library will > ensure that re-implementations in nearly every useful language will > continue. Don't get me wrong. I'm not against Git reimplementations per se, as long as they rigorously implement the exact format and protocol from CGit. In that sense it is important that the CGit fsck and verify-pack tools be exploited on objects/packs produced by alternate Git implementation systematically to find such issues. Nicolas