Re: [JGIT PATCH 4/6] Add QuotedString class to handle C-style quoting rules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Robin Rosenberg <robin.rosenberg@xxxxxxxxxx> wrote:
> onsdag 10 december 2008 23:05:49 skrev Shawn O. Pearce:
> > Git patch files can contain file names which are quoted using the
> > C language quoting rules.  In order to correctly create or parse
> 
> Should we maybe call this Git-style since we really do not care
> about C (which version btw?).

Yea, I think you are right.  I'll change the name.
 
> Making two interfaces is better. We may share the implementation initially,
> but parsing file names in Git patches and parsing C strings are different
> operations.

Well, if we ever supported other C-style string names we could just
swap the implementation reference.  I'll try to make it clearer
this format, although like a C-style string, is really meant for
Git path names within patches.
 
> > +	public void testQuote_OctalAll() {
> > +		assertQuote("\1", "\\001");
> > +		assertQuote("~", "\\176");
> > +		assertQuote("\u00ff", "\\303\\277"); // \u00ff in UTF-8
> > +	}
>
> What do we do with non-UTF8 names? I think we should
> follow the logic we use when parsing commits and paths
> in other places.

Then we're totally f'd.

Git has no specific encoding on file names.  If we get a standard
Java Unicode string and get asked to quote it characters with
code points above 127 need to be escaped as an octal escape code
according to the Git style.  Further the Git style only permits
octal escapes that result in a value <= 255, aka an unsigned char.

The name needs to be encoded into an 8-bit encoding, and UTF-8 is
the only encoding that will represent every valid Unicode character.
Elsewhere we sort of take the attitude that when writing data *out*
we produce UTF-8, even if we read in ISO-whatever.  Here I'm doing
the same thing.

> > +	public void testDequote_UnknownEscapeQ() {
> > +		assertDequote("\\q", "\\q");
> > +	}
>
> Would Git generate this style in a name? 

No, but a mangled patch might have it.  Rather than throw an
exception I try to parse the string as faithfully as I can, so we
can continue on and mine the stream for more information.  We may
be able to work around the breakage (there's a lot of redundant
data about path names in the git patch format).

As far as I can tell C Git handles this style of escape the same way:

  $ cat F
  diff --git "a/\q" "b/\q"
  --- "a/\q"
  +++ "b/\q"
  $ git apply --stat F
   "\\q\"" |    0
   1 files changed, 0 insertions(+), 0 deletions(-)

Actually, I think there's a bug, its reading the closing quote as
part of the file name.  But anyway, my point is that C git handles
an unknown escape sequence as though it were a literal.
 
> > +			quote['\b'] = 'b';
> \e = esc

Oddly enough, C Git doesn't output \e.  It uses \033.  It doesn't
recognize \e either, so we can't produce it even if we wanted to
honor it on input.  I can add it to our input table (but keep it
out of our output table), but then we'd treat \e as \033 while C
Git would treat \e as a literal.  Bad.

So no, I won't add \e here.

> > +			return decode(Constants.CHARSET, r, 0, rPtr);
> 
> Importing methods really obscures things. Please qualify with class name
> of RawparseUtils here instead. 

Hmmph.  I wonder how you'll feel then about the patch parser code.
I import 4 methods from RawParseUtils because they are so commonly
invoked that I got tired of reading RawParseUtils.foo.

There's only two calls here so I changed it to be qualified.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux