Re: [PATCH 00/22] Refactor to accept NUL in commit messages

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> · Sun, 23 Oct 2011 12:24:25 +1100

2011/10/23 Junio C Hamano <gitster@xxxxxxxxx>:
> I do not think we want to go this route.
>
> There are two possible approaches to attack this.
>
>  - If we want to show everything after a potential and rare NUL in the log
>   message most of the time, then "struct commit" should just store
>   <ptr,len> pair. This grows "struct commit" with one extra ulong.
>
>  - If we want to give us a way to notice and show these "funnily, this
>   commit log message has a NUL in it" case as an exception in only
>   selected codepaths, then "struct commit" should just gain "flags"
>   4-byte int field between "indegree" and "date", and
>   parse_commit_buffer() should set one bit in the flags when the log
>   message has NUL in it. And teach only these selected codepaths to find
>   the length from the object name with sha1_object_info() as needed. This
>   grows "struct commit" with one 4-byte int, with runtime overhead only
>   where it matters.
>
> The approach taken by the patch wastes two malloc() blocks with their own
> allocation overhead, and unused "alloc" field in the strbuf that does not
> have to be there.

We could allocate just one block with length as the first field:

struct commit_buffer {
        unsigned long len;
        char buf[FLEX_ARRAY];
};

The downside is commit_buffer field type in struct commit changes,
which impacts many codepaths. If we agree to allow NUL in commit
objects, then all codepaths should be aware of that fact, otherwise
funny things may happen because string processing in this function
stops early due to NUL, but others run fine..

Jeff's low-level normalization approach sounds much simpler, but
probably trickier because we need to identify where to normalize and
denormalize. At least with type change, the compiler spots all the
places for me.

I would not worry about runtime processing overhead. The string end
check is basically converted from "if (*msg)" to "if (msg < msg_end)".
There will one more pointer (msg_end) in stack for each call. Unless
we do deep recursion, we should be fine. Memory overhead is still
something to profile.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html