Junio C Hamano <junkio@xxxxxxx> wrote: > I know this is controversial, but here is a small four patch > series to let you insert arbitrary external filter in checkin > and checkout codepath. This series is some pretty nice work. But I really don't think we want filters. Actually, I'm very against them, and I'm actually very against the CRLF work that has already been added. Since the CRLF ship has sailed I won't try to call it back to port. But I don't want to see the filter stuff raise anchor... I've only really seen a few arguments for the filters: 1) Better compress structured content (e.g. ODF) by storing the ZIP as a tree, allowing normal deltification within packfiles to apply to the contained files. 2) Use a custom diff function on special files (e.g. ODF) as they are otherwise unreadable with the internal xdiff based engine. 3) Mutate content prior to extracting from the tree, e.g. printing. Let me try to address these points. #1: There are a limited number of content formats that we could reasonably filter into the repository such that the standard deltification routines will have good space/performance benefits. Most of them today are ZIP archives (e.g. ODF, JAR). Why don't we just teach the packfile format how to better compress these types of streams? Let read_sha1_file() and pack-objects do all of the heavy translation work, just as they do today for text files. Explode them into a "tree-like" thing that allows deltification against any other content (even cross ZIP streams) just like we do with trees, but always expose them to the working directory level of the system as blobs. This way we never get into the mess that David Lang pointed out where we have many optimizations that reuse working tree files when stat data matches; nor do we have to worry about major structural differences between the working tree (1 file) and the repository format (exploded ZIP as 10,000 files). #2: We already support using any diff tool you want: set the GIT_EXTERNAL_DIFF environment variable before running a program that generates a diff. As Junio pointed out on #git tonight, that could be any shell script that decides how to produce the diff based on its own logic. Though we could also use the new attribute stuff to select diff programs, much like we do now for merge conflict resolution in merge-recursive. #3: This has already been discussed at length on the list. Letting the build system perform this sort of work is better than making the VCS do it; especially when you want the VCS to do its sole job well (track the state of the working directory) and the build system do its sole job well (produce files suitable for use outside of the repository). So despite the fact that I tried to make 4/4 shorter, I really don't think we should be doing this... -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html