On Thu, 15 April 2010, Jeff King wrote: > On Wed, Apr 14, 2010 at 02:34:01PM -0700, Junio C Hamano wrote: > > Jakub Narebski <jnareb@xxxxxxxxx> writes: > > > > > Well, this whole idea started with the fact, that "git status --short" > > > was hard (or impossible) to parse unambigously by scripts[1], and even > > > "git status --porcelain -z"[2] is not that easy to parse[3]. > > > > And you apparently seem to agree with that claim, but I don't. I think > > Jeff (who did the --porcelain stuff; by the way, why did we lose him from > > Cc list?) has already said that he is open to an update. > > I haven't seen any evidence that status --porcelain (or its -z form) is > impossible to parse unambiguously. I don't even think it's that hard, > but it certainly could be easier. But more importantly, from looking at > the output it's not necessarily _obvious_ how to parse it correctly > (e.g., whitespace as value and as field separator, syntax of "-z" > depends on semantics of field contents). Well, IMVHO output of "git status --short" / "git status --porcelain" (without '-z') is very hard to parse. Even assuming that in the case of ambiguity filenames are quoted (which also means that in the case of ambiguity whether they are quoted they must be quoted), the fact that separator between source and destination filename in the case of rename detection is " -> " (if I understand it correctly), and neither of ' ' (SPC), '-' nor '>' is replaced by escape sequence means that one needs to detect where quoted filename begins and where ends. This means either parsing character by character, taking into account quoting and escaping (e.g. '\\', '\"' etc.), or using 'balanced quote' regexp like the one from Text::Balanced, e.g.: (?:\"(?:[^\\\"]*(?:\\.[^\\\"]*)*)\") What was the reason behind choosing " -> " as separator between pair[1] of filenames in rename, instead of using default "git diff --stat" format i.e. 'arch/{i386 => x86}/Makefile' for "git status --short" which is meant for end user, and for "git status --porcelain" the same format that raw diff format, i.e. with TAB as separator between filenames, and filename quited if it contains TAB (then TAB is relaced by '\t', and does not appear in filename, therefore you can split on TAB)? IMVHO "git status --porcelain -z" format is not easy to parse either. (The same can be said for "git diff --raw -z" output format.) You can't just split on record separator; you have to take into account status to check if there are two filenames or one. [1] A question: we have working area version, index version, and HEAD version of file. Isn't it possible for *each* of them to have different filename? What about the case of rename/rename merge conflict? > > The approach I proposed was to leave it be and document it a bit better. > Adding some format that is close but subtly different is just going to > lead to more confusion. Well, the proposed '-Z' output format, in the OFS="\0", ORS="\0\0" variant, would be very easy to parse. If I understand it correctly it is also one of available format in outputification^W in this series. > > But since Julian was willing to do the JSON work, I think that is a much > nicer approach. It's not subtly different; it's very different and way > easier to read and parse. And I'm really happy with the way he has > structured the code to handle multiple output formats. It keeps the code > much cleaner, and it should silence any "but YAML is better than JSON is > better than XML" debates. I really like this outputification ;-) too. Although if possible I'd like to have it wrapped in utility macros, like parseopt, so one does not need to write output_str / output_int etc.... but currently it is very, very vague sketch of an idea, rather than realized concept. > > Even with Julian's patches, we should still better document the regular > and "-z" forms. Eric promised to send some patches this week; I'm hoping > he is still interested in doing so after seeing a better solution arise. > :) -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html