Re: [PATCH 0/5] tree-walk object_id refactor

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Fri, 11 Jan 2019 00:17:50 +0000

On Thu, Jan 10, 2019 at 01:40:31AM -0500, Jeff King wrote:
> On Thu, Jan 10, 2019 at 04:25:46AM +0000, brian m. carlson wrote:
> 
> > There are a small number of places in our codebase where we cast a
> > buffer of unsigned char to a struct object_id pointer. When we have
> > GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places
> > (the buffer for tree objects) can lead to us copying too much data when
> > using SHA-1 as the hash, since there are only 20 bytes to read.
> > 
> > This was not expected to be a problem before future code was introduced,
> > but due to a combination of series the issue became noticeable.
> > 
> > This series introduces a refactor to avoid referencing the struct
> > object_id directly from a buffer and instead storing an additional
> > struct object_id (and an int) in struct name_entry and referring to
> > that.
> 
> I think this is really the only safe and sane solution. We resisted it
> because of the cost of the extra copies (especially the
> update_tree_entry() one). But I don't know that anybody actually
> measured it. Do you have any performance numbers before/after this
> series?

Unfortunately, I don't. I'm not really sure in what situations we hit
this code path a lot, so I'm not sure what exactly we should performance
test. If you have suggestions, I can set up some perf tests.

I will say that I resisted writing this series for a long time, since I
knew it was going to be a difficult slog to get everything working (and
it was). If there had been a way to avoid it, I would have done it.
However, writing this series led to about ten tests in my SHA-256 work
suddenly passing where they hadn't before.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204
Attachment:
signature.asc

Description: PGP signature