Re: [JGIT PATCH 4/6] Add QuotedString class to handle C-style quoting rules

Robin Rosenberg <robin.rosenberg@xxxxxxxxxx> · Thu, 11 Dec 2008 01:33:51 +0100

torsdag 11 december 2008 00:41:30 skrev Shawn O. Pearce:
> > > +	public void testQuote_OctalAll() {
> > > +		assertQuote("\1", "\\001");
> > > +		assertQuote("~", "\\176");
> > > +		assertQuote("\u00ff", "\\303\\277"); // \u00ff in UTF-8
> > > +	}
> >
> > What do we do with non-UTF8 names? I think we should
> > follow the logic we use when parsing commits and paths
> > in other places.
> 
> Then we're totally f'd.
> 
> Git has no specific encoding on file names.  If we get a standard
> Java Unicode string and get asked to quote it characters with
> code points above 127 need to be escaped as an octal escape code
> according to the Git style.  Further the Git style only permits
> octal escapes that result in a value <= 255, aka an unsigned char.
> 
> The name needs to be encoded into an 8-bit encoding, and UTF-8 is
> the only encoding that will represent every valid Unicode character.
> Elsewhere we sort of take the attitude that when writing data *out*
> we produce UTF-8, even if we read in ISO-whatever.  Here I'm doing
> the same thing.

So this should pass, right?

	public void testDeQuote_Latin1() {
		assertDequote("\u00c5ngstr\u00f6m", "\\305ngstr\\366m"); // Latin1
	}

	public void testDeQuote_UTF8() {
		assertDequote("\u00c5ngstr\u00f6m", "\\303\\205ngstr\\303\\266m");
	}

And possibly these actuall unquoted names, which can be produced when
core.quotepath is false

	public void testDeQuote_Rawlatin() {
		assertDequote("\u00c5ngstr\u00f6m", "\305ngstr\366m");
	}

	public void testDeQuote_RawUTF8() {
		assertDequote("\u00c5ngstr\u00f6m", "\303\205ngstr\303\266m");
	}

You also reversed the arguments to testQuote. It think we should follow the
"expected"-first conventions here too. The case above works neither way.
Using Constant.encode in the test is kind of dangerous as it does too
many conversions, so you don't know what you're testing anymore. Changing
assertDequote like this makes us able to feed byte sequences as strings
to the test method (which we cannot do if we assume UTF-8 encoding). ISO-
latin-encoding allows any byte sequence to be entered conveniently.

	private static void assertDequote(final String exp, final String in) {
		final byte[] b;
		try {
			b = ('"' + in + '"').getBytes("ISO-8859-1");
		} catch (UnsupportedEncodingException e) {
			throw new RuntimeException(e);
		}
		final String r = C.dequote(b, 0, b.length);
		assertEquals(exp, r);
	}

-- robin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html