On Tue, 27 Aug 2013, Junio C Hamano wrote: > Nicolas Pitre <nico@xxxxxxxxxxx> writes: > > > Let's create another dictionary table to hold the author and committer > > entries. We use the same table format used for tree entries where the > > 16 bit data prefix is conveniently used to store the timezone value. > > > > In order to copy straight from a commit object buffer, dict_add_entry() > > is modified to get the string length as the provided string pointer is > > not always be null terminated. > > > > Signed-off-by: Nicolas Pitre <nico@xxxxxxxxxxx> > > --- > > @@ -135,8 +136,73 @@ static void sort_dict_entries_by_hits(struct dict_table *t) > > rehash_entries(t); > > } > > > > +static struct dict_table *commit_name_table; > > static struct dict_table *tree_path_table; > > > > +/* > > + * Parse the author/committer line from a canonical commit object. > > + * The 'from' argument points right after the "author " or "committer " > > + * string. The time zone is parsed and stored in *tz_val. The returned > > + * pointer is right after the end of the email address which is also just > > + * before the time value, or NULL if a parsing error is encountered. > > + */ > > +static char *get_nameend_and_tz(char *from, int *tz_val) > > +{ > > + char *end, *tz; > > + > > + tz = strchr(from, '\n'); > > + /* let's assume the smallest possible string to be "x <x> 0 +0000\n" */ > > + if (!tz || tz - from < 13) > > + return NULL; > > + tz -= 4; > > + end = tz - 4; > > + while (end - from > 5 && *end != ' ') > > + end--; > > + if (end[-1] != '>' || end[0] != ' ' || tz[-2] != ' ') > > + return NULL; > > + *tz_val = (tz[0] - '0') * 1000 + > > + (tz[1] - '0') * 100 + > > + (tz[2] - '0') * 10 + > > + (tz[3] - '0'); > > + switch (tz[-1]) { > > + default: return NULL; > > + case '+': break; > > + case '-': *tz_val = -*tz_val; > > + } > > + return end; > > +} > > This may want to share code with ident.c::split_ident_line(), as we > have been trying to reduce the number of ident-line parsers. Hmmm.... The problem I have with split_ident_line() right now is about the fact that it is too liberal with whitespaces. Here I must be sure I can deconstruct a commit object and be sure I still can regenerate it byte for byte in order to match its SHA1 signature. So there _must_ always be only one space between the email closing bracket and the time stamp, only one space between the time stamp and the time zone value, and no space after the time zone. Is there a reason why split_ident_line() is not stricter in that regard? Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html