Re: [RFC/PATCH 0/2] place cherry pick line below commit title

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Wed, 5 Oct 2016 12:38:46 -0700

On 10/04/2016 10:25 AM, Junio C Hamano wrote:
So I would say it is perfectly OK if your update works only for
cases we can clearly define the semantics for.  For example, we can
even start with something simple like:

 * A RFC822-header like line, together with any number of whitespace
   indented lines that immediately follow it, will be taken as a
   single logical trailer element (with embedded LF in it if it uses
   the "line folding").  For the purpose of "replace", the entire
   single logical trailer element is replaced.

 * A line that begins with "(cherry picked from" and "[" becomes a
   single logical trailer element.  No continuation of anything
   fancy.

 * A line with any other shape is a garbage line in a trailer
   block.  It is kept in its place, but because it does not even
   have <token> part, it will not participate in locating with
   "trailer.where", "trailer.ifexists", etc.

Sounds reasonable to me. Would the "[" be a bit of overspecification, 
though, since Git doesn't produce it? Also, identifying it as a garbage 
line probably wouldn't change any behavior - in the Linux kernel 
examples, it is used to show what happened in between sign-offs, so 
there will always be one "Signed-off-by:" at the top.  (But I do not 
feel strongly about this.)

A block of lines that appear as the last paragraph in a commit
message is a trailer block if and only if certain number or
percentage of lines are non-garbage lines according to the above
definition.

I think the number should be 1 - that seems like the easiest to explain. 
But I'm OK with other suggestions.

I wonder if we can share a new helper function to do the detection
(and classification) of a trailer block and parsing the logical
lines out of a commit log message.  The function signature could be
as simple as taking a single <const char *> (or a strbuf) that holds
a commit log message, and splitting it out into something like:

    struct {
	const char *whole;
	const char *end_of_message_proper;
	struct {
		const char *token;
		const char *contents;
	} *trailer;
	int alloc_trailers, nr_trailers;
    };

where

 - whole points at the first byte of the input, i.e. the beginning
   of the commit message buffer.

 - end-of-message-proper points at the first byte of the trailer
   block into the buffer at "whole".

 - token is a canonical header name for easy comparison for
   interpret-trailers (you can use NULL for garbage lines, and made
   up token like "[bracket]" and "(cherrypick)" that would not clash
   with real tokens like "Signed-off-by").

 - contents is the bytes on the logical line, including the header
   part

E.g. an element in trailer[] array may say

    {
	.token = "Signed-off-by",
        .contents = "Signed-Off-By: Some Body <some@xxxxxxx>\n",
    }

I get the impression from the rest of your e-mail that no strings are 
meant to be copied - is that true? (That sounds like a good idea to me.) 
In which case this might be better:

  struct {
    const char *first_trailer; /* = end_of_message_proper */
    struct {
      const char *start;
      const char *value;
      const char *end;
    } *trailers;
    int trailers_nr, trailers_alloc;
  };

start = value for "[", "(cherry picked from" and garbage lines. We also 
need end because there is no \0 there (we didn't copy any strings).

The existing code (in trailer.c) uses a linked list to store trailers, 
but an array (as written in your e-mail) is probably better for us since 
clients would want to access the last element (as also written in your 
e-mail).

With something like that, you can manipulate the "insert at ...",
"replace", etc. in the trailer[] array and then produce an updated
commit message fairly easily (i.e. copy out the bytes beginning at
"whole" up to "end_of_message_proper", then iterate over trailer[]
array and show their contents field).  The codepaths in the core
part only need to know

 - how to check the last item in trailer[] array to see if it ends
   with the same sign-off as they are trying to add.

 - how to append one new element to the trailer[] array.

 - reproduce an updated commit log message after the above.

I don't think we need trailer block struct -> commit message conversion 
- when adding a new trailer or replacing an existing trailer, the client 
code can just remember the index and then modify its behavior 
accordingly when iterating through all trailers. But this conversion can 
be easily added if/when we need it.

> Hmm?

Overall, this seems like a good idea - I'll go ahead and do this if 
there are no other objections.

It just occurred to me that there could be some corner cases when the 
trailer separator is configured to not include ":" - I'll make sure to 
include tests that check those corner cases.