[Originally sent 5 days ago, but seems to have been a victim of the vger.kernel.org problems at the time, re-sending] On Sat, May 30 2020, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@xxxxxxxxx> Full snipped E-Mail in the archive: https://lore.kernel.org/git/pull.795.v3.git.git.1590870357549.gitgitgadget@xxxxxxxxx/ > There are multiple repositories in the wild with random, invalid > timezones. Most notably is a commit from rails.git with a timezone of > "+051800"[1]. A few searches will find other repos with that same > invalid timezone as well. Further, Peff reports that GitHub relaxed > their fsck checks in August 2011 to accept any timezone value[2], and > there have been multiple reports to filter-repo about fast-import > crashing while trying to import their existing repositories since they > had timezone values such as "-7349423" and "-43455309"[3]. I've been looking at some of our duplicate logic here after my mktag series where we now use fsck validation. It had a hardcoded "1400" offset value, which I see fast-import.c still has. Then in mailmap.c we have parse_name_and_email(), then there's split_ident_line() in ident.c, and of course fsck_ident(). record_person_from_buf() in fmt-merge-msg.c, copy_name() and copy_email() in ref-filter.c. Maybe handle_from() in mailinfo.c also counts. Anyway, aside from the last these are all parsers for "author/committer" lines in commits one way or another. But I was wondering about fast-import.c in particular. I think Elijah's patch here is obviously good an incremental improvement. But stepping back a bit: who cares about sort-of-fsck validation in fast-import.c anyway? Shouldn't it just pretty much be importing data as-is, and then we could document "if you don't trust it, run fsck afterwards"? Or, if it's a use-case people actually care about, then I might see about unifying some of these parser functions as part of a series I'm preparing.