On 10/04/2016 10:25 AM, Junio C Hamano wrote:
So I would say it is perfectly OK if your update works only for
cases we can clearly define the semantics for. For example, we can
even start with something simple like:
* A RFC822-header like line, together with any number of whitespace
indented lines that immediately follow it, will be taken as a
single logical trailer element (with embedded LF in it if it uses
the "line folding"). For the purpose of "replace", the entire
single logical trailer element is replaced.
* A line that begins with "(cherry picked from" and "[" becomes a
single logical trailer element. No continuation of anything
fancy.
* A line with any other shape is a garbage line in a trailer
block. It is kept in its place, but because it does not even
have <token> part, it will not participate in locating with
"trailer.where", "trailer.ifexists", etc.
Sounds reasonable to me. Would the "[" be a bit of overspecification,
though, since Git doesn't produce it? Also, identifying it as a garbage
line probably wouldn't change any behavior - in the Linux kernel
examples, it is used to show what happened in between sign-offs, so
there will always be one "Signed-off-by:" at the top. (But I do not
feel strongly about this.)
A block of lines that appear as the last paragraph in a commit
message is a trailer block if and only if certain number or
percentage of lines are non-garbage lines according to the above
definition.
I think the number should be 1 - that seems like the easiest to explain.
But I'm OK with other suggestions.
I wonder if we can share a new helper function to do the detection
(and classification) of a trailer block and parsing the logical
lines out of a commit log message. The function signature could be
as simple as taking a single <const char *> (or a strbuf) that holds
a commit log message, and splitting it out into something like:
struct {
const char *whole;
const char *end_of_message_proper;
struct {
const char *token;
const char *contents;
} *trailer;
int alloc_trailers, nr_trailers;
};
where
- whole points at the first byte of the input, i.e. the beginning
of the commit message buffer.
- end-of-message-proper points at the first byte of the trailer
block into the buffer at "whole".
- token is a canonical header name for easy comparison for
interpret-trailers (you can use NULL for garbage lines, and made
up token like "[bracket]" and "(cherrypick)" that would not clash
with real tokens like "Signed-off-by").
- contents is the bytes on the logical line, including the header
part
E.g. an element in trailer[] array may say
{
.token = "Signed-off-by",
.contents = "Signed-Off-By: Some Body <some@xxxxxxx>\n",
}
I get the impression from the rest of your e-mail that no strings are
meant to be copied - is that true? (That sounds like a good idea to me.)
In which case this might be better:
struct {
const char *first_trailer; /* = end_of_message_proper */
struct {
const char *start;
const char *value;
const char *end;
} *trailers;
int trailers_nr, trailers_alloc;
};
start = value for "[", "(cherry picked from" and garbage lines. We also
need end because there is no \0 there (we didn't copy any strings).
The existing code (in trailer.c) uses a linked list to store trailers,
but an array (as written in your e-mail) is probably better for us since
clients would want to access the last element (as also written in your
e-mail).
With something like that, you can manipulate the "insert at ...",
"replace", etc. in the trailer[] array and then produce an updated
commit message fairly easily (i.e. copy out the bytes beginning at
"whole" up to "end_of_message_proper", then iterate over trailer[]
array and show their contents field). The codepaths in the core
part only need to know
- how to check the last item in trailer[] array to see if it ends
with the same sign-off as they are trying to add.
- how to append one new element to the trailer[] array.
- reproduce an updated commit log message after the above.
I don't think we need trailer block struct -> commit message conversion
- when adding a new trailer or replacing an existing trailer, the client
code can just remember the index and then modify its behavior
accordingly when iterating through all trailers. But this conversion can
be easily added if/when we need it.
> Hmm?
Overall, this seems like a good idea - I'll go ahead and do this if
there are no other objections.
It just occurred to me that there could be some corner cases when the
trailer separator is configured to not include ":" - I'll make sure to
include tests that check those corner cases.