Hello! We have a problem with Classpath in the following case: * a directory contains a file with a latin1-encoded filename, ie. the filename is _not_ valid UTF-8. * VMFile.list is called for this directory. The invalid UTF-8 filename is passed to (*env)->NewStringUTF. What happens with CACAO? - CACAO hangs in an internal function. What happens with JamVM? - The string is converted to UTF-8 in a brute-force way, meaning for example that an "Umlaut A" latin1 character is combined with the following character (or terminating zero) into some garbage UTF-8 character. Both VMs clearly have a bug here, as NewStringUTF should fail gracefully in these cases and return NULL (after, probably, throwing an exception.) This, however, does not solve the underlying problem. Should VMList.file really fail in a directory containing a latin1 filename? If we decide to depend on the locale, can we really expect that all filenames match the locale encoding? I checked what the reference implementation does, using the attached program: The RI always interprets the filenames it gets from the system as latin1 (or similar), independent of the file.encoding property, it seems. This has the following consequences: * latin1 filenames work as expected * UTF-8 filenames are read as latin1, so they end up with more characters than they should have. However, if the string is passed through unmodified, and converted back to latin1 on output, it is the same as what the system returned from readdir. (Java code working with the string will see messed up characters, but the whole mess is round-trip safe.) I'm really at a loss here what to do. Any ideas? -Edwin
import java.io.File; public class test { public static void main(String[] args) { if (args.length < 1) { System.out.println("Please specify a directory on the command line."); return; } File file = new File(args[0]); if (!file.isDirectory()) { System.out.println("error: " + args[0] + " is not a directory"); return; } try { String[] ls = file.list(); int i; for (i=0; i<ls.length; ++i) { System.out.println(i + ": (length " + ls[i].length() + ") " + ls[i]); byte[] bytes = ls[i].getBytes(); for (int j=0; j<bytes.length; ++j) { if (j!=0 && (j%16)==0) System.out.println(""); else System.out.print(" "); System.out.print(Integer.toHexString(bytes[j] & 0xff)); } System.out.println(""); } } catch (Exception ex) { System.out.println("exception: " + ex); ex.printStackTrace(System.out); } } }