Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach)

Jakub Narebski <jnareb@xxxxxxxxx> · Tue, 28 Apr 2009 04:24:31 -0700 (PDT)

Martin Langhoff <martin.langhoff@xxxxxxxxx> writes:

> Eric Sink hs been working on the (commercial, proprietary) centralised
> SCM Vault for a while. He's written recently about his explorations
> around the new crop of DSCMs, and I think it's quite interesting. A
> quick search of the list archives makes me thing it wasn't discussed
> before.
> 
> The guy is knowledgeable, and writes quite witty posts -- naturally,
> there's plenty to disagree on, but I'd like to encourage readers not
> to nitpick or focus on where Eric is wrong. It is interesting to read
> where he thinks git and other DSCMs are missing the mark.
> 
>    Maybe he's right, maybe he's wrong, but damn he's interesting :-)
> 
> So here's the blog -  http://www.ericsink.com/

"Here's a blog"... and therefore my dilemma. Should I post my reply
as a comment to this blog, or should I reply here on git mailing list?

> These are the best entry points

Because those two entries are quite different, I'll reply separately

1.  "Ten Quirky Issues with Cross-Platform Version Control"
>   http://www.ericsink.com/entries/quirky.html

which is generic comment about (mainly) using version control
in heterogenic environment, where different machines have different
filesystem limitations.  I'll concentrate here on that issue.

2.  "Mercurial, Subversion, and Wesley Snipes"
>   http://www.ericsink.com/entries/hg_denzel.html

where, paraphrasing, Eric Sink says that he doesn't write about
Mercurial and Subversion because they are perfect.  Or at least not
as controversial (and controversial means interesting).

> 
> To be frank, I think he's wrong in some details (as he's admittedly
> only spent limited time with it) but right on the larger-picture
> (large userbases want it integrated and foolproof, bugtracking needs
> to go distributed alongside the code, git is as powerful^Wdangerous as
> C).

Neither of mentioned above blog posts touches those issues, BTW...

----------------------------------------------------------------------
Ad 1. "Ten Quirky Issues with Cross-Platform Version Control"

Actually those are two issues: troubles with different limitations of
different filesystems, and different dealing with line endings in text
files on different platforms.

Line endings (issue 8.) is in theory and in practice (at least for
Git) a non-issue.  

In theory you should use project's convention for end of line
character in text files, and use smart editor that can deal (or can be
configured to deal) with this issue correctly.

In practice this is a matter of correctly setting up core.autocrlf
(and in more complicated cases, where more complicated means for git
very very rare, configuring which files are text and which are not).

There are a few classes of troubles with filesystems (with filenames).

1. Different limitations on file names (e.g. pathname length),
   different special characters, different special filenames (if any).
   Those are issues 2. (special basename PRN on MS Windows), 
   issue 3. (trailing dot, trailing whitespace), issue 4. (pathname
   and filename length limit), issue 6. (special characters, in this
   case colon being path element delimiter on MacOS, but it is also
   about special characters like colon, asterisk and question mark
   on MS Windows) and also issue 7. (name that begins with dash)
   in Eric Sink article.

   The answer is convention for filenames in a project. Simply DON'T
   use filenames which can cause problems.  There is no way to simply
   solve this problem in version control system, although I think if
   you really, really, really need it you should be able to cobble
   something together using low-level git tools to have different name
   for filename in working directory from the one used in repository
   (and index).

   See also David A. Wheeler essay "Fixing Unix/Linux/POSIX Filenames:
   Control Characters (such as Newline), Leading Dashes, and Other Problems" 
   http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

   DON'T DO THAT.

2. "Case-insensitive" but "case-preserving" filesystems; the case
   where some different filenames are equivalent (like 'README' and
   'readme' on case-insensitive filesystem), but are returned as you
   created them (so if you created 'README', you would get 'README' in
   directory listing, but filesystem would return that 'readme' exists
   too).  This is issue 1. ('README' and 'readme' in the same
   directory) in Eric Sink article.

   The answer is like for previous issue: don't.  Simply DO NOT create
   files with filenames which differ only in case (like unfortunate
   ct_conntrack.h and cn_CONNTRACK.h or similar in Linux kernel).

   But I think that even in case where such unfortunate incident (two
   filenames differing only in case) occur, you can deal with it in
   Git by using lower level tools (and editing only one of two such
   files at once).  You would get spurious info about modified files
   in git-status, though...  perhaps that could be improved using
   infrastructure created (IIRC) by Linus for dealing with 'insane'
   filesystems.

   DON'T DO THAT, SOLVABLE.

3. Non "Case-preserving" filesystems, where filename as sequence of
   bytes differ between what you created, and what you get from
   filesystem.  An example here is MacOS X filesystem, which accepts
   filenames in NFC composed normalized form of Unicode, but stores
   them internally and returns them in NFD decomposed form.  This is
   issue 9. (Español being "Espa\u00f1ol" in NFC, but "Espan\u0303ol"
   in NFD).

   In this case 'don't do this' might be not acceptable answer.
   Perhaps you need non-ASCII characters in filenames.  Not always can
   you use filesystem or specify mount point option that makes it not
   a problem.

   I remember that this issue was discussed extensively on git mailing
   list, but I don't remember what was the conclusion (beside agreeing
   that filesystem that is not "*-preserving" is not sane filesystem ;).
   In particular I do not remember if Git can deal with this issue
   sanely (I remember Linus adding infrastructure for that, but did it
   solve this problem...).

   PROBABLY SOLVED.

4. Filesystems which cannot store all SCM-sane metainfo, for example
   filesystems without support for symbolic links, or without support
   for executable permission (executable bit).  This is extension of
   issue 10. (which is limited to symbolic links) in Eric Sink
   article.

   In Git you have core.fileMode to ignore executable bit differences
   (you would need to use SCM tools and not filesystem tools to
   maniulate it), and core.symlinks to be able to checkout symlinks as
   plain text files (again using SCM tools to manipulate).

   SOLVED.

There is also mistaken implicit assumption that version control
systems have (and should) preserve all metadata.

5. The issue of extra metadata that is not SCM-sane, and which
   different filesystems can or cannot store.  Examples include full
   Unix permissions, Unix ownership (and groups file belongs to),
   other permission-related metadata such as ACL, extra resources tied
   to file such as EA (extended attributes) for some Linux filesystems
   or (in)famous resource form in MacOS.  This is issue 5. (resource
   fork on MacOS vs. xattrs on Linux) in Eric Sink article.

   This is not an issue for SCM: _source_ code management system
   to solve.  Preserving extra metadata indiscrimitedly can cause
   problems, like e.g. full permissions and ownership.  Therefore
   SCM preserve only limited SCM-sane subset of metadata.  If you
   need to preserve extra metadata, you can use (in good SCMs) hooks
   for that, like e.g. etckeeper uses metastore (in Git).

   NOT A PROBLEM.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html