Re: [PATCH] archive-zip: load userdiff config

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 03, 2017 at 06:24:39PM +0100, René Scharfe wrote:

> Am 02.01.2017 um 23:25 schrieb Jeff King:
> > Since 4aff646d17 (archive-zip: mark text files in archives,
> > 2015-03-05), the zip archiver will look at the userdiff
> > driver to decide whether a file is text or binary. This
> > usually doesn't need to look any further than the attributes
> > themselves (e.g., "-diff", etc). But if the user defines a
> > custom driver like "diff=foo", we need to look at
> > "diff.foo.binary" in the config. Prior to this patch, we
> > didn't actually load it.
> 
> Ah, didn't think of that, obviously.
> 
> Would it make sense for userdiff_find_by_path() to die if userdiff_config()
> hasn't been called, yet?

Yeah, perhaps. That makes it impossible for a program to intentionally
ignore the config. But it looks like even plumbing diff commands load
userdiff (which makes sense; they control its behavior through things
like ALLOW_TEXTCONV). So it's probably fine to have it everywhere.

Other options include:

  1. Just loading it always as part of git_default_config.

  2. Lazy-loading it on the first call. This seems elegant, though it
     does open up hidden cache-invalidation issues. E.g., somebody asks
     for userdiff_find_by_path(), we load the values, then they
     setup_git_repository(), and we would need to reload. That's
     far-fetched for userdiff, but it makes lazy-loading as a general
     pattern a bit of a potential maintenance trap.

     We could also introduce some infrastructure to deal with that
     (e.g., if callers could ask the config machinery "have you been
     invalidated"). That would help here and in other places (e.g., I
     considered this when dealing with get_shared_repository()).

> > I also happened to notice that zipfiles are created using the local
> > timezone (because they have no notion of the timezone, so we have to
> > pick _something_). That's probably the least-terrible option, but it was
> > certainly surprising to me when I tried to bit-for-bit reproduce a
> > zipfile from GitHub on my local machine.
> 
> That reminds me of an old request to allow users better control over the
> meta-data written into archives.  Being able to specify a time zone offset
> could be a start.

I did it with:

  TZ=PST8PDT git archive ...

which let me get a bit-for-bit match with what GitHub generates. The
real problem was just knowing that I needed to do that. OTOH, we're
considering having GitHub generate all archives in UTC for sanity's
sake, and it would be nice to do that by setting zip.timezone instead of
hacking $TZ for each invocation.

> > +static int archive_zip_config(const char *var, const char *value, void *data)
> > +{
> > +	return userdiff_config(var, value);
> > +}
> > +
> >  static int write_zip_archive(const struct archiver *ar,
> >  			     struct archiver_args *args)
> >  {
> >  	int err;
> > 
> > +	git_config(archive_zip_config, NULL);
> > +
> 
> I briefly thought about moving this call to archive.c with the rest of the
> config-related stuff, but I agree it's better kept here.

That was my first thought, but there are already two config calls:
write_archive() loads default config, but then archive-tar loads
tar-specific config. Since only zip cares about userdiff, I patterned it
after the latter. But arguably everybody _could_ end up calling into
userdiff. If we take that philosophy, though, I'd be more inclined to
push it into git_default_config(). That covers archive writers _and_ any
other programs which might happen to call into the diff code.

> Looks good, thanks!

Thanks for reviewing.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]