Nicolas Pitre <nico@xxxxxxx> wrote: > On Fri, 1 May 2009, Shawn O. Pearce wrote: > > > On an unrelated note, someone asked me recently, how do we ensure > > compatibility in implementations between git.git and jgit? > > Well... this is not exactly easy. As I said in the past > (http://marc.info/?l=git&m=121035043412788&w=2), I think that the C > version must remain the reference with regards to protocols and on-disk > data structures. I agree fully. > If people go wild with JGit and start making changes > to data structures then it simply won't be Git compatible anymore and > the user base will get fragmented. Agree. We may see some prototyping happen in JGit first on some topics, and JGit may even support something earlier than git.git, e.g JGit has an amazon-s3:// transport that git.git doesn't have. But it also isn't widely used. > A formal compatibility test suite would imply that every Git > reimplementation should be compatible with the reference C version. > You could add some tests in your test suite which are performed in > parallel using JGit and the C git, and make sure that the produced > results are identical, etc. Yea, and to some extent we try to do that already in JGit, but our tests aren't complete enough in that area. > But to which extent should the C version remain backward compatible with > other implementations? Let's suppose a future protocol extension is > made and old unsuspecting C clients work just fine but some other > implementation crashes with it? This is what I think scares both myself and the folks that have recently asked me about compatibility. If JGit gets a broader user base, and suddenly it stops working against a newer C git-daemon because of a protocol change, those users are going to be pissed. Its no worse than the "github can't ever upgrade past 1.6.1" issue we had not too long ago. I think we're doing better these days about embedding file format version numbers into files (e.g. pack idx v2) to help alert older clients that the format is different. But we also have a something of a history of looking for "holes" in older C git parsers in order to wedge in new features where we didn't plan for them in the first place. E.g. the protocol capability slots we have now. I think that as reimplementations become more popular, we need to rely less on extending things by exploiting parser quirks in older C git.git code, and rely more on at least explicit version markers that everyone can work with. > And the reference implementation cannot be held back because > of bugs in all alternative implementations. I agree. A bug is a bug. But I'd really like to get away from the trend where we exploit bugs in older C git.git implementations to add new functionality, because maybe JGit doesn't have that same bug and will fall flat on its face with that exploit. > As long as they're futzing^Wdeveloping on top of Jgit then > interoperability shouldn't be at risk. If people would start adding new > object types and pack formats and the like without obtaining a consensus > with people around the C version then I might get extremely worried (and > pissed) though. That's why JGit is BSD, so everyone can use the one f'king library and not risk fragmenting the Java market further. But yea, I'd be really pissed too if someone hacked up JGit and made it incompatible with anything else. Its a risk that the liberal BSD license permits. I'm really sort of hoping that the development momentum around git.git and JGit trying to keep up will keep them coming back to the canonical JGit for updates, forcing them to give back any hacks^Wimprovements they have made. If the improvements really are worthwhile, they can be easily ported over to C before they become widely used in JGit. > One defensive approach we could adopt is to use a capability slot to > identify the software version of each peer involved in the network > communication. The advantage would be for a later Git version to avoid > doing some things that are known to break with client X or Y. Of course > even such a scheme can be abused and misused, like on some web sites if > you don't have the "right" browser, leading some of them to allow faking > the User-Agent string, etc. But maybe the upsides are more important > than the downsides. This doesn't help with on-disk interoperability, > but this is probably less important than communication interoperability. Blargh. I'm with you about the whole User-Agent mess. Asking clients and servers to identify with implementation and version markers might be useful for analysis of who-is-using-what, but I don't think its a good way to negotiate between the peers of what functionality to enable or disable, or what bug workarounds to use. Reminds me of the Apache hack during output to work around an HTTP header parsing bug in Netscape 2 when the "\r\n" pair was exactly at byte 256 in the stream. *shudder* FWIW, an EGit user recently complained that some random Git hosting site they were using couldn't work with EGit, but EGit worked fine with other sites, e.g. GitHub. Apparently this site's SSH forced command filter script didn't like EGit asking for "git upload-pack 'path.git'". Its not strictly a Git protocol issue, how the client launches the remote process over SSH, but this random hosting site was apparently relying on C git's current calling convention of "git-upload-pack 'path.git'". Long story short, I claimed it was the hosting site's bug. :-) -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html