Re: [PATCH] parse_object: try internal cache before reading object db

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 05, 2012 at 01:55:22PM -0800, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > The worst potential problem I could come up with is if you somehow had
> > an object whose "parsed" flag was set, but somehow didn't have its other
> > fields set (like type).
> > ...
> > So I think it is safe short of somebody doing some horrible manual
> > munging of a "struct object".
> 
> Yeah, I was worried about codepaths like commit-pretty-printing might be
> mucking with the contents of commit->buffer, perhaps reencoding the text
> and then calling parse_object() to get the unmodified original back, or
> something silly like that. But the lookup_object() call at the beginning
> of the parse_object() already prevents us from doing such a thing, so we
> should be OK, I would think.

Er, without my patch there is no such lookup_object, is there?

What saves you is that the parse_*_buffer functions all do nothing when
the object.parsed flag is set, and the code I added makes sure that
object.parsed is set in the object that lookup_object returns.

So yeah, anytime you tweak the contents of commit->buffer but don't
unset the "parsed" flag, you are asking for trouble.

Here's another possible code path where the behavior is changed:

  1. You set the global save_commit_buffer to 0.

  2. You call parse_commit(commit) on an unparsed commit object, which
     does not save the commit buffer, but does set
     commit->object.parsed.

  3. You call parse_object(commit->object.sha1).

     a. Without my patch, we read the file contents again, do _not_
        re-parse them (because we look up the existing object and notice
        that its "parsed" flag is set), but we _do_ assign the buffer to
        commit->buffer.

     b. With my patch, we see that there is an existing object that is
        already parsed, and return early. commit->buffer remains NULL.

I would argue that this doesn't matter, since "parse_commit" uses the
exact same optimization (it returns early without setting commit->buffer
if the parsed flag is set). So any program turning off
save_commit_buffer has to be ready to deal with a NULL commit->buffer in
the first place. The only exception would be a program that then tries
to fill in the commit->buffer field by manually running parse_object on
an already-parsed, buffer-less commit object. I don't think we do that.

You can verify that commit->buffer is the only place where these issues
can happen by following the logic in parse_object_buffer.

Sorry to belabor the discussion, but this is such a core piece of code,
I want to make sure the optimization isn't hurting anybody (I don't
think it is, and certainly the tests are all happy, but I think talking
through the cases is a good thing).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]