Re: [PATCH 0/4] External 'filter' attributes and drivers

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Sun, 22 Apr 2007 01:20:08 -0400

Junio C Hamano <junkio@xxxxxxx> wrote:
> I know this is controversial, but here is a small four patch
> series to let you insert arbitrary external filter in checkin
> and checkout codepath.

This series is some pretty nice work.

But I really don't think we want filters.  Actually, I'm very
against them, and I'm actually very against the CRLF work that
has already been added.  Since the CRLF ship has sailed I won't
try to call it back to port.  But I don't want to see the filter
stuff raise anchor...

I've only really seen a few arguments for the filters:

1) Better compress structured content (e.g. ODF) by storing the
   ZIP as a tree, allowing normal deltification within packfiles
   to apply to the contained files.

2) Use a custom diff function on special files (e.g. ODF) as they
   are otherwise unreadable with the internal xdiff based engine.

3) Mutate content prior to extracting from the tree, e.g. printing.

Let me try to address these points.

#1: There are a limited number of content formats that we could
reasonably filter into the repository such that the standard
deltification routines will have good space/performance benefits.
Most of them today are ZIP archives (e.g. ODF, JAR).

Why don't we just teach the packfile format how to better compress
these types of streams?  Let read_sha1_file() and pack-objects do all
of the heavy translation work, just as they do today for text files.
Explode them into a "tree-like" thing that allows deltification
against any other content (even cross ZIP streams) just like we
do with trees, but always expose them to the working directory
level of the system as blobs.

This way we never get into the mess that David Lang pointed out
where we have many optimizations that reuse working tree files when
stat data matches; nor do we have to worry about major structural
differences between the working tree (1 file) and the repository
format (exploded ZIP as 10,000 files).

#2: We already support using any diff tool you want: set the
GIT_EXTERNAL_DIFF environment variable before running a program that
generates a diff.  As Junio pointed out on #git tonight, that could
be any shell script that decides how to produce the diff based on
its own logic.  Though we could also use the new attribute stuff
to select diff programs, much like we do now for merge conflict
resolution in merge-recursive.

#3: This has already been discussed at length on the list.
Letting the build system perform this sort of work is better than
making the VCS do it; especially when you want the VCS to do its
sole job well (track the state of the working directory) and the
build system do its sole job well (produce files suitable for use
outside of the repository).

So despite the fact that I tried to make 4/4 shorter, I really
don't think we should be doing this...

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html