Re: [RFC/PATCH v2 0/4] A new library for plumbing output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 15 April 2010, Jeff King wrote:
> On Wed, Apr 14, 2010 at 02:34:01PM -0700, Junio C Hamano wrote:
> > Jakub Narebski <jnareb@xxxxxxxxx> writes:
> > 
> > > Well, this whole idea started with the fact, that "git status --short"
> > > was hard (or impossible) to parse unambigously by scripts[1], and even
> > > "git status --porcelain -z"[2] is not that easy to parse[3].
> > 
> > And you apparently seem to agree with that claim, but I don't.  I think
> > Jeff (who did the --porcelain stuff; by the way, why did we lose him from
> > Cc list?) has already said that he is open to an update.
> 
> I haven't seen any evidence that status --porcelain (or its -z form) is
> impossible to parse unambiguously. I don't even think it's that hard,
> but it certainly could be easier. But more importantly, from looking at
> the output it's not necessarily _obvious_ how to parse it correctly
> (e.g., whitespace as value and as field separator, syntax of "-z"
> depends on semantics of field contents).

Well, IMVHO output of "git status --short" / "git status --porcelain"
(without '-z') is very hard to parse.  Even assuming that in the case
of ambiguity filenames are quoted (which also means that in the case of
ambiguity whether they are quoted they must be quoted), the fact that
separator between source and destination filename in the case of rename
detection is " -> " (if I understand it correctly), and neither of ' '
(SPC), '-' nor '>' is replaced by escape sequence means that one needs
to detect where quoted filename begins and where ends.  This means
either parsing character by character, taking into account quoting and
escaping (e.g. '\\', '\"' etc.), or using 'balanced quote' regexp like
the one from Text::Balanced, e.g.:  (?:\"(?:[^\\\"]*(?:\\.[^\\\"]*)*)\")

What was the reason behind choosing " -> " as separator between pair[1]
of filenames in rename, instead of using default "git diff --stat" format
i.e. 'arch/{i386 => x86}/Makefile' for "git status --short" which is
meant for end user, and for "git status --porcelain" the same format 
that raw diff format, i.e. with TAB as separator between filenames,
and filename quited if it contains TAB (then TAB is relaced by '\t',
and does not appear in filename, therefore you can split on TAB)?

IMVHO "git status --porcelain -z" format is not easy to parse either.
(The same can be said for "git diff --raw -z" output format.)  You
can't just split on record separator; you have to take into account
status to check if there are two filenames or one.

[1] A question: we have working area version, index version, and HEAD
    version of file.  Isn't it possible for *each* of them to have 
    different filename?  What about the case of rename/rename merge
    conflict?
> 
> The approach I proposed was to leave it be and document it a bit better.
> Adding some format that is close but subtly different is just going to
> lead to more confusion.

Well, the proposed '-Z' output format, in the OFS="\0", ORS="\0\0"
variant, would be very easy to parse.  If I understand it correctly
it is also one of available format in outputification^W in this series.

> 
> But since Julian was willing to do the JSON work, I think that is a much
> nicer approach. It's not subtly different; it's very different and way
> easier to read and parse. And I'm really happy with the way he has
> structured the code to handle multiple output formats. It keeps the code
> much cleaner, and it should silence any "but YAML is better than JSON is
> better than XML" debates.

I really like this outputification ;-) too.

Although if possible I'd like to have it wrapped in utility macros,
like parseopt, so one does not need to write output_str / output_int
etc.... but currently it is very, very vague sketch of an idea, rather
than realized concept.

> 
> Even with Julian's patches, we should still better document the regular
> and "-z" forms. Eric promised to send some patches this week; I'm hoping
> he is still interested in doing so after seeing a better solution arise.
> :)

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]