Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 09 Nov 2014 10:24:20 -0800

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Christian Couder <chriscool@xxxxxxxxxxxxx> writes:
>
>> Yeah, it won't be as efficient as using only one strbuf and only byte
>> oriented functions, and it looks much less manly too :-) But over time in
>> Git we have developed a number of less efficient but quite clean
>> abstractions like strbuf, argv_array, sha1_array and so on, that we
>> are quite happy with.
>
> Actually, all these examples you gave are fairly efficient and clean
> abstractions.  I find it insulting to pretend that the "one line per
> strbuf" is in the same league.  It isn't.
>
> And it is not about manliness.

By the way, I do not mean to say that all of strbuf (which is a
rather full API) is uniformly efficient and cleanly abstracted.

For example, I see strbuf_split() is a handy but a lousy invitation
to stupid programmers to do stupid things and needs to be used
carefully (rather, you need to carefully consider if a possible
solution that uses it is really a good solution).  If you start with
a single strbuf (e.g. perhaps read from a file or a blob in bulk in
an I/O efficient way), pass it around and then have to manipulate
each and every line in it to massage the information on each line
into other useful form (e.g. perhaps each line is a [+]src:dst
refspec element and you are parsing it into an array of "struct
refspec"), it may be tempting to write a parser from a single strbuf
into a single refspec, use strbuf_split() to break the single strbuf
input into multiple and feed each of them to your parser.

But it does not have to be the only way to create such a caller.
Unless the final "other useful form" is an array of strbuf, and you
have to remember that the whole point of using strbuf is because it
is easy to edit its contents efficiently in place, that needs to
further be modified in various ways, the splitting of N lines into N
strbuf, only to parse each line one by one, is a huge waste.  A
right approach may involve introducing a foreach_strbuf_line
iterator function that takes a callback, or a macro of the same name
that puts a control structure.

Going back to the trailer processing, if you are likely to be
editing each and every line in the input before producing the final
result, an array of strbuf would be a perfectly sensible data
structure to use.  I just didn't get the impression that that is
what is being done.  You would be presented a single text with
multiple lines (whose natural representation would be a <ptr,len>
that fits in a single strbuf), you would be asked to inspect and
manipulate one single section in the middle (namely, the runs of
lines that begin with some "keyword: "), and return the
concatenation of the part before that one section (intact), the
section you have manipulated (i.e. the trailer) and possibly the
remainder (intact).  Outside that single section you will be
touching, I do not see a good reason for each of their lines to be
first stored in a strbuf (while moving bunch of pointers in the
array using ARRAY_GROW() and friends), only to be later concatenated
back into a single string.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html