Re: Why does fast-import need to check the validity of idents? + Other ident adventures

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 03 Feb 2021 11:20:27 -0800

"=?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?=" Bjarmason <avarab@xxxxxxxxx>
writes:

> But I was wondering about fast-import.c in particular. I think Elijah's
> patch here is obviously good an incremental improvement. But stepping
> back a bit: who cares about sort-of-fsck validation in fast-import.c
> anyway?

Those who want to notice and verify the procedure they used to
produce the import data from the original before it is too late?

I.e. data gets imported to Git, victory declared and then old SCM
turned gets off---and only then the resulting imported repository is
found not to pass fsck.

> Shouldn't it just pretty much be importing data as-is, and then we could
> document "if you don't trust it, run fsck afterwards"?

If it is a small import, the distinction does not matter, but for a
huge import, the procedure to produce the data is likely to be
mechanical, so even after processing just a very small portion of
early part of the datastream, systematic errors would be noticed
before fast-import wastes importing too much garbage that need to be
discarded after running such fsck.  So in that sense, I suspect that
there is value in the early validation.

> Or, if it's a use-case people actually care about, then I might see
> about unifying some of these parser functions as part of a series I'm
> preparing.

I think allowing people to loosen particular checks for fast-import
(or elsewhere for that matter) is a good idea, and you can do so
more easily once the existing checking is migrated to your new
scheme that shares code with the fsck machinery.