Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line

Jeff King <peff@xxxxxxxx> · Thu, 17 Aug 2017 01:55:16 -0400

On Wed, Aug 16, 2017 at 10:59:02PM +0000, brian m. carlson wrote:

> On Wed, Aug 16, 2017 at 02:24:27PM +0200, Patryk Obara wrote:
> > On Tue, Aug 15, 2017 at 7:02 PM, Stefan Beller <sbeller@xxxxxxxxxx> wrote:
> > >>         const int entry_size = GIT_SHA1_HEXSZ + 1;
> > >
> > > outside the scope of this patch:
> > > Is GIT_SHA1_HEXSZ or GIT_MAX_HEXSZ the right call here?
> > 
> > I think neither one. In my opinion, this code should not be so closely
> > coupled to hash parsing code - it should be tasked with parsing
> > whitespace separated list of commit ids without relying on specific
> > commit id length or format.
> 
> What I had intended, although maybe I have not explained this well, was
> that we would have one binary that set up hash functionality as part of
> early setup.  GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ would turn into
> something like current_hash->rawsz and current_hash->hexsz at that
> point.  The reason I introduced the GIT_MAX constants was to allocate
> memory suitable for whatever hash we picked.
> 
> However, this is only what I had considered for design, and others might
> have different views going forward.  I have, however, based my patches
> on that assumption, and responded to others' comments with those
> statements.

What you wrote here matches my understanding of the general plan. IOW,
we'd expect to "waste" 12 bytes when dealing with a 160-bit sha1 in a
Git binary that's aware of 256-bit hashes. But that seems like a small
price to pay to be able to continue using automatic allocations, versus
rewriting each site to call xmalloc(current_hash->rawsz).

I'd expect most of the GIT_MAX constants to eventually go away in favor
of "struct object_id", but that will still be using the same "big enough
to hold any hash" size under the hood.

> I agree that ideally we should make as much of the code as possible
> ignorant of the hash size, because that will generally result in more
> robust, less brittle code.  I've noticed in this series the use of
> parse_oid_hex, and I agree that's one tool we can use to accomplish that
> goal.

Agreed. Most code should be dealing with the abstract concept of a hash
and shouldn't have to care about the size. I really like parse_oid_hex()
for that reason (and I think parsing is the main place we've found that
needs to care).

-Peff