On Mon, 21 Jan 2008, Junio C Hamano wrote: > > yet you can obtain a path component longer than 256 bytes. Individual components are limited to 255 bytes by most filesystems (PATH_MAX is the whole path, not any individual component). That said, you're right. It's not really a design requirement, and since you never get an array of "struct dirent", just a pointer to a single one, it would be perfectly normal and natural for "struct dirent" to be declared with a unsized d_name[]. It's also quite possible that some implementations might even have d_name[] not as an array, but as a pointer to somewhere else (POSIX may require it to be an array, I didn't check). That said, I bet that Mark isn't the only one to have written code like that, so I suspect Mark's code probably works in practice pretty much everywhere, even if I don't think it's necessarily _required_ to work correctly. I do suspect that if you really want to make this portable, and able to handle an expanding d_name[] too, I think you need to make sure you allocate a big-enough one. And if you worry about d_name perhaps being a pointer, that really does mean that you'd need to convert the system-supplied "struct dirent" into a "git_dirent_t" that you can control. That said, I think this patch has a bigger problem, namely just fundamentally that char *utf8 = reencode_string(entry, "UTF8", "UTF8-MAC"); is just unbelievably slow. That's just not how it should be done. First off, the common case is that the filename likely has everything in plain 7-bit ascii. So rather than re-encoding by default, the first thing to do is to just see if it even needs re-encoding. Even if it's as simple as saying "does it have any high bits at all", that's going to be a *huge* performance win. So start off with something like int is_usascii(const char *p) { char c; do { c = *p++; } while (c > 0); return !c; } and now you can do if (is_usascii(entry->d_name)) return entry; before you even *look* at re-encoding it (and this basically works for all cases - we really don't care about EBCDIC, do we? So even if this routine was meant to do Latin1<->utf8, the above "is_usascii()" test is always the right thing to do). Anyway, even if you do that, our "reencode_string()" is really *so* expensive that you really don't want to do it on a filename by filename basis. It literally does a malloc() for each allocation. It might well be worth it to find something that is more utf-8-specific (and I could well imagine that Mac OS X comes with some UTF libraries, if only because we cannot possibly be the only people with this issue). (Same goes for Latin1<->UTF conversion, for that matter. If somebody wants to add that, I suspect it's best done by hand, not using iconv and our rather expensive layer around it. That said, latin1->utf8 is actually much *easier* than utf-8 NFD->NFC). Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html