Re: git-fast-import

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2007-02-06 at 13:53 -0500, Nicolas Pitre wrote:
> On Tue, 6 Feb 2007, Linus Torvalds wrote:
> 
> > I'm not so worried about the git date parsing routines (which are fairly 
> > solid) as about the fact that absolutely *tons* of people get rfc2822 
> > wrong.
> > 
> > They allow pretty much any half-way valid date, exactly because people 
> > don't do rfc2822 right anyway (and because they are also meant to work 
> > even if you write the date by hand, like "12:34 2005-06-07").
> > 
> > Sure, you can still mess up the program that actually generates the data 
> > for gfi, and have bugs like that *there*, but at least they'd have to 
> > think a bit about it.
> 
> Well, exactly because GIT already has fairly solid date parsing 
> routines, and the fact that we needed solid date parsing routines in the 
> first place, exactly because people don't do rfc2822 right anyway, 
> should be a hell of a big clue why we should parse date information for 
> the gfi frontend.  Because the date is for sure most likely in a screwed 
> up format already and it is counter productive to have to deal with that 
> in a duplicated piece of code.  And the bare reality is that people will 
> just not care to parse it right themselves. 

Nevertheless, they _should_. The principle is simple -- wherever there
is ambiguity, you should seek to resolve that as _close_ to the point of
origin as possible. Your 'best guess' gets worse and worse the further
you go from the source of the data.

If you're exporting from a legacy repository in one part of the world,
then transferring the raw data to a machine elsewhere to be imported
into git, you _really_ want to be making your guesses about timezones
and character sets in the _export_ stage; not the subsequent import.

So there's a lot to be said for nailing down gfi's intermediate format
and removing _all_ the ambiguity from it -- using git format dates
(which I did that way precisely for the lack of ambiguity), and using
UTF-8 (or some other _specified_ but not assumed character set).

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]