problem with latin1 filename and VMFile.list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

We have a problem with Classpath in the following case:

    * a directory contains a file with a latin1-encoded filename, ie.
      the filename is _not_ valid UTF-8.

    * VMFile.list is called for this directory. The invalid UTF-8
      filename is passed to (*env)->NewStringUTF.

What happens with CACAO? - CACAO hangs in an internal function.

What happens with JamVM? - The string is converted to UTF-8 in a
brute-force way, meaning for example that an "Umlaut A" latin1 character
is combined with the following character (or terminating zero) into
some garbage UTF-8 character.

Both VMs clearly have a bug here, as NewStringUTF should fail gracefully
in these cases and return NULL (after, probably, throwing an exception.)

This, however, does not solve the underlying problem. Should VMList.file
really fail in a directory containing a latin1 filename? If we decide to
depend on the locale, can we really expect that all filenames match the
locale encoding?

I checked what the reference implementation does, using the attached
program: The RI always interprets the filenames it gets from the system
as latin1 (or similar), independent of the file.encoding property, it
seems. This has the following consequences:

    * latin1 filenames work as expected

    * UTF-8 filenames are read as latin1, so they end up with more
      characters than they should have. However, if the string is passed
      through unmodified, and converted back to latin1 on output, it is
      the same as what the system returned from readdir. (Java code
      working with the string will see messed up characters, but the
      whole mess is round-trip safe.)

I'm really at a loss here what to do. Any ideas?

-Edwin

import java.io.File;

public class test {
	public static void main(String[] args) {
		if (args.length < 1) {
			System.out.println("Please specify a directory on the command line.");
			return;
		}

		File file = new File(args[0]);
		if (!file.isDirectory()) {
			System.out.println("error: " + args[0] + " is not a directory");
			return;
		}

		try {
			String[] ls = file.list();

			int i;
			for (i=0; i<ls.length; ++i) {
				System.out.println(i + ": (length " + ls[i].length() + ") " + ls[i]);
				byte[] bytes = ls[i].getBytes();
				for (int j=0; j<bytes.length; ++j) {
					if (j!=0 && (j%16)==0)
						System.out.println("");
					else
					    System.out.print(" ");
					System.out.print(Integer.toHexString(bytes[j] & 0xff));
				}
				System.out.println("");
			}
		}
		catch (Exception ex) {
			System.out.println("exception: " + ex);
			ex.printStackTrace(System.out);
		}
	}
}


[Index of Archives]     [Linux Kernel]     [Linux Cryptography]     [Fedora]     [Fedora Directory]     [Red Hat Development]

  Powered by Linux