Re: Switching from CVS to GIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



tisdag 16 oktober 2007 skrev Steffen Prohaska:
> 
> On Oct 16, 2007, at 2:33 PM, Johannes Schindelin wrote:
> 
> >> Maybe we need a configuration similar to core.autocrlf (which  
> >> controls
> >> newline conversion) to control filename comparison and normalization?
> >>
> >> Most obviously for the case (in-)sensitivity on Windows, but I also
> >> remember the unicode normalization happening on Mac's HFS filesystem
> >> that caused trouble in the past.
> >
> > Robin Rosenberg has some preliminary code for that.  The idea is to  
> > wrap
> > all filesystem operations in cache.h, and do a filename normalisation
> > first.
> 
> At that point we could add a safety check. Paths that differ only by
> case, or whitespace, or ... (add general and project specific rules  
> here)
> should be denied. This would guarantee that tree objects can always be
> checked out. Even if the filesystem capabilities are limited.
> 
> Robin, what do you think?

My code only normalizes filenames to UTF-8 inside git, which isn't the same 
thing. I think that can be extended to handling MacOSX normalized UTF-8 and
Windows UTF-16 so, when you check out a thing from git there will be no 
surprises. Case insensitivity is another dimension. I have no idea as to the
performance of the code, it's more like a proof-that-it-can-be-done.

The code cannot "fail", it always does something reasonable, like not 
converting when that is not possible. Something else has to be done for 
validation.

The UTF-16 that windows use is not a current issue because git  only does 
local code page. Jgit, but it isn't very smart either because git doesn't say 
anything about filename encoding, while Windows/MacOSX/CIFS and other 
filesystems does.

The fact that git uses eigth bit file names may also be a reason performance 
is slower on Windows, because the eight-bit Win32API transforms all strings 
and filenames to the native UTF-16 encoding on *every* system call, in and 
out; that's a lot of work when you do it thousands of times. If git itself 
did the transform it might be made smarter and more suited to git's purposes, 
and most importantly faster. I have no idea about the performance hit. One
has to measure something.

I notice a number of SCM's out there, including one with a \$\d{4} pricetag 
gets you into trouble if you rename a file from Foo to FOO on Windows.

-- robin
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux