Re: [PATCH v1 2/4] ref: add regular ref content check for files backend

Patrick Steinhardt <ps@xxxxxx> · Thu, 22 Aug 2024 10:46:31 +0200

On Tue, Aug 20, 2024 at 09:49:23AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@xxxxxxxxx> writes:
> 
> > We implicitly reply on "git-fsck(1)" to check the consistency of regular
> 
> "reply" -> "rely", I think.
> 
> > refs. However, when parsing the regular refs for files backend, we allow
> > the ref content to end with no newline or contain some garbages. We
> > should warn the user about above situations.
> 
> Hmph, should we?  
> 
> If the content is short (e.g., in SHA-1 repository it only has 39
> hexdigit) even if that may be sufficient to uniquely name the
> object, we should warn about it, of course.  A file that has
> 64-hexdigit with a terminating LF at the end may be a valid file to
> be in $GIT_DIR/refs/ hierarchy in a SHA-256 repository, but such a
> file in a SHA-1 repository should also be subject to a warning, as
> it could be a sign that somebody screwed up object format
> conversion.
> 
> But a file that has only 40-hexdigit without a terminating LF at the
> end?  Or a file that has 40-hexdigit followed by a CRLF instead of
> LF?  Or a file that has the identical content as a valid ref on its
> first line, but has extra stuff on its second and subsequent lines?
> 
> What does the name-to-object-name-mapping layer (aka "get_oid" API)
> do when they see such a file in the $GIT_DIR/refs/ hierarchy?  If
> they are treated as valid ref in the "normal" code path, it needs a
> strong justification to tighten the rules retroactively, much
> stronger than "Our current code, and any of our older versions,
> would have written such a file as a loose ref with our code."
> 
> "What are we protecting us from with this tightening?" is the
> question we should be asking ourselves, when evaluating each of
> these new rules that fsck used not to care about.

I'd say filesystem corruption, buggy implementations and compatibility
with other implementations of Git. The format for refs does not allow
for any other information than either an object ID for plain refs, and
the referee for symbolic refs. The fact that we do accept that is a mere
implementation detail because we reuse the same function to parse refs
that we also use for pseudorefs. And these _can_ have additional data.

So any reference that contains additional data is not a proper ref and
thus should be warned about from my point of view. No Git tooling should
write them, so if something does it's a red flag to me.

Patrick