Re: [PATCH v2] sha1-file: remove OBJECT_INFO_SKIP_CACHED

Jeff King <peff@xxxxxxxx> · Tue, 7 Jan 2020 06:22:13 -0500

On Mon, Jan 06, 2020 at 03:47:53PM -0800, Jonathan Nieder wrote:

> >> + * Callers are responsible for calling write_object_file to record the
> >> + * object in persistent storage before writing any other new objects
> >> + * that reference it.
> >> + */
> >>  int pretend_object_file(void *, unsigned long, enum object_type,
> >>  			struct object_id *oid);
> >
> > I think this is an improvement over the status quo, but it's still a
> > potential trap for code which happens to run in the same process (see my
> > other email in the thread).
> >
> > Should the message perhaps be even more scary?
> 
> A pet peeve of mine is warning volume escalation: if it becomes common
> for us to say
> 
>  * Warning: callers are reponsible for [...]
> 
> then new warnings trying to stand out might say
> 
>  * WARNING: callers are responsible for [...]
> 
> and then after we are desensitized to that, we may switch to
> 
>  * WARNING WARNING WARNING, not the usual blah-blah: callers are
> 
> and so on.  The main way I have found to counteract that is to make
> the "dangerous curve" markers context-specific enough that people
> don't learn to ignore them.  After all, sometimes a concurrency
> warning is important to me, at other times warnings about clarity may
> be what attract my interest, and so on.

I meant less about the number of capital letters, and more that we
should be saying "this interface is dangerous; don't use it". Because
it's not just "callers are responsible". It's "this can cause subtle
and hard-to-debug issues because it's writing to global state".

My preferred solution would actually be to rip it out entirely, but we'd
need some solution for git-blame, the sole caller. Possibly it could
insert the value straight into the diff_filespec. But according to the
thread that I linked earlier, I poked at that last year but it didn't
look trivial.

> I don't have a good suggestion here.  Perhaps "Callers are responsible
> for" is too slow and something more terse would help?
> 
>  /*
>   * Adds an object to the in-memory object store, without writing it
>   * to disk.
>   *
>   * Use with caution!  Unless you call write_object_file to record the
>   * in-memory object to persistent storage, any other new objects that
>   * reference it will point to a missing (in memory only) object,
>   * resulting in a corrupt repository.
>   */

Yeah, that's more what I had in mind.

> It would be even better if we have some automated way to catch this
> kind of issue.  Should tests run "git fsck" after each test?  Should
> write_object_file have a paranoid mode that checks integrity?
> 
> I don't know an efficient way to do that.  Ultimately I am comfortable
> counting on reviewers to be aware of this kind of pitfall.  While
> nonlocal invariants are always hard to maintain, this pitfall is
> inherent in the semantics of the function, so I am not too worried
> that reviewers will overlook it.

Yeah, given the scope of the problem (we have a single caller, and this
mechanism is over a decade old) I'm fine with review as the enforcement
mechanism, too.

> A less error-prone interface would tie the result of
> pretend_object_file to a short-lived overlay on the_repository without
> affecting global state.  We could even enforce read-only access in
> that overlay.  I don't think the "struct repository" interface and
> callers are ready for that yet, though.

I agree that would be better, though it's still kind-of global (in that
the repository object is effectively a global for most processes).

-Peff