Re: [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 21 Jan 2008, Linus Torvalds wrote:
> 
> I do suspect that if you really want to make this portable, and able to 
> handle an expanding d_name[] too, I think you need to make sure you 
> allocate a big-enough one. And if you worry about d_name perhaps being a 
> pointer, that really does mean that you'd need to convert the 
> system-supplied "struct dirent" into a "git_dirent_t" that you can 
> control.
> 
> That said, I think this patch has a bigger problem, namely just 
> fundamentally that
> 
> 	char *utf8 = reencode_string(entry, "UTF8", "UTF8-MAC");
> 
> is just unbelievably slow. That's just not how it should be done.

Having thought about this some more, I'm starting to suspect that the 
"readdir()" wrapper thing won't work very well.

Yes, it will work on OS X, but for all the wrong reasons. It works there 
just because of the stupid normalization that OS X does both on filename 
input and output, so if we hook into readdir() and munge the name there, 
we'll still be able to use the munged name for lstat() and open().

However, we'll never be able to test it on a sane Unix system, and it 
won't ever be able to handle the case of a filesystem actually being 
Latin1 but git being asked to try to transparently convert it to utf-8 in 
order to work with others.

Because most of those readdir() calls will just be fed back not just to 
the filesystem as lstat() calls later, but also to the recursive directory 
traversal itself, so if we munge the name, we're also going to screw name 
lookup.

Again, as an OSX-only workaround it's probably acceptable, and perhaps 
that's the only thing to look at right now. But it does strike me as a 
design mistake to do it at that level.

It would be conceptually nicer to do it in "add_file_to_index()" instead. 
Ie anything that creates a "struct cache_entry" would do the 
conversion. 

So it would be good if somebody looked at what happens if you do the OSX 
hack in add_file_to_index() instead, and see if it works there..

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux