Martin Langhoff <martin.langhoff@xxxxxxxxx> writes: > Eric Sink hs been working on the (commercial, proprietary) centralised > SCM Vault for a while. He's written recently about his explorations > around the new crop of DSCMs, and I think it's quite interesting. A > quick search of the list archives makes me thing it wasn't discussed > before. > > The guy is knowledgeable, and writes quite witty posts -- naturally, > there's plenty to disagree on, but I'd like to encourage readers not > to nitpick or focus on where Eric is wrong. It is interesting to read > where he thinks git and other DSCMs are missing the mark. > > Maybe he's right, maybe he's wrong, but damn he's interesting :-) > > So here's the blog - http://www.ericsink.com/ "Here's a blog"... and therefore my dilemma. Should I post my reply as a comment to this blog, or should I reply here on git mailing list? > These are the best entry points Because those two entries are quite different, I'll reply separately 1. "Ten Quirky Issues with Cross-Platform Version Control" > http://www.ericsink.com/entries/quirky.html which is generic comment about (mainly) using version control in heterogenic environment, where different machines have different filesystem limitations. I'll concentrate here on that issue. 2. "Mercurial, Subversion, and Wesley Snipes" > http://www.ericsink.com/entries/hg_denzel.html where, paraphrasing, Eric Sink says that he doesn't write about Mercurial and Subversion because they are perfect. Or at least not as controversial (and controversial means interesting). > > To be frank, I think he's wrong in some details (as he's admittedly > only spent limited time with it) but right on the larger-picture > (large userbases want it integrated and foolproof, bugtracking needs > to go distributed alongside the code, git is as powerful^Wdangerous as > C). Neither of mentioned above blog posts touches those issues, BTW... ---------------------------------------------------------------------- Ad 1. "Ten Quirky Issues with Cross-Platform Version Control" Actually those are two issues: troubles with different limitations of different filesystems, and different dealing with line endings in text files on different platforms. Line endings (issue 8.) is in theory and in practice (at least for Git) a non-issue. In theory you should use project's convention for end of line character in text files, and use smart editor that can deal (or can be configured to deal) with this issue correctly. In practice this is a matter of correctly setting up core.autocrlf (and in more complicated cases, where more complicated means for git very very rare, configuring which files are text and which are not). There are a few classes of troubles with filesystems (with filenames). 1. Different limitations on file names (e.g. pathname length), different special characters, different special filenames (if any). Those are issues 2. (special basename PRN on MS Windows), issue 3. (trailing dot, trailing whitespace), issue 4. (pathname and filename length limit), issue 6. (special characters, in this case colon being path element delimiter on MacOS, but it is also about special characters like colon, asterisk and question mark on MS Windows) and also issue 7. (name that begins with dash) in Eric Sink article. The answer is convention for filenames in a project. Simply DON'T use filenames which can cause problems. There is no way to simply solve this problem in version control system, although I think if you really, really, really need it you should be able to cobble something together using low-level git tools to have different name for filename in working directory from the one used in repository (and index). See also David A. Wheeler essay "Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems" http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html DON'T DO THAT. 2. "Case-insensitive" but "case-preserving" filesystems; the case where some different filenames are equivalent (like 'README' and 'readme' on case-insensitive filesystem), but are returned as you created them (so if you created 'README', you would get 'README' in directory listing, but filesystem would return that 'readme' exists too). This is issue 1. ('README' and 'readme' in the same directory) in Eric Sink article. The answer is like for previous issue: don't. Simply DO NOT create files with filenames which differ only in case (like unfortunate ct_conntrack.h and cn_CONNTRACK.h or similar in Linux kernel). But I think that even in case where such unfortunate incident (two filenames differing only in case) occur, you can deal with it in Git by using lower level tools (and editing only one of two such files at once). You would get spurious info about modified files in git-status, though... perhaps that could be improved using infrastructure created (IIRC) by Linus for dealing with 'insane' filesystems. DON'T DO THAT, SOLVABLE. 3. Non "Case-preserving" filesystems, where filename as sequence of bytes differ between what you created, and what you get from filesystem. An example here is MacOS X filesystem, which accepts filenames in NFC composed normalized form of Unicode, but stores them internally and returns them in NFD decomposed form. This is issue 9. (Español being "Espa\u00f1ol" in NFC, but "Espan\u0303ol" in NFD). In this case 'don't do this' might be not acceptable answer. Perhaps you need non-ASCII characters in filenames. Not always can you use filesystem or specify mount point option that makes it not a problem. I remember that this issue was discussed extensively on git mailing list, but I don't remember what was the conclusion (beside agreeing that filesystem that is not "*-preserving" is not sane filesystem ;). In particular I do not remember if Git can deal with this issue sanely (I remember Linus adding infrastructure for that, but did it solve this problem...). PROBABLY SOLVED. 4. Filesystems which cannot store all SCM-sane metainfo, for example filesystems without support for symbolic links, or without support for executable permission (executable bit). This is extension of issue 10. (which is limited to symbolic links) in Eric Sink article. In Git you have core.fileMode to ignore executable bit differences (you would need to use SCM tools and not filesystem tools to maniulate it), and core.symlinks to be able to checkout symlinks as plain text files (again using SCM tools to manipulate). SOLVED. There is also mistaken implicit assumption that version control systems have (and should) preserve all metadata. 5. The issue of extra metadata that is not SCM-sane, and which different filesystems can or cannot store. Examples include full Unix permissions, Unix ownership (and groups file belongs to), other permission-related metadata such as ACL, extra resources tied to file such as EA (extended attributes) for some Linux filesystems or (in)famous resource form in MacOS. This is issue 5. (resource fork on MacOS vs. xattrs on Linux) in Eric Sink article. This is not an issue for SCM: _source_ code management system to solve. Preserving extra metadata indiscrimitedly can cause problems, like e.g. full permissions and ownership. Therefore SCM preserve only limited SCM-sane subset of metadata. If you need to preserve extra metadata, you can use (in good SCMs) hooks for that, like e.g. etckeeper uses metastore (in Git). NOT A PROBLEM. -- Jakub Narebski Poland ShadeHawk on #git -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html