I'm using smudge/clean filters in git-annex now, and it's not been an entirely smooth fit between the interface and what git-annex wants to do. The clean filter has to consume the whole file content on stdin; not reading it all will make git think the clean filter failed. But, git-annex often doesn't need to read the whole content of a work-tree file in order to clean it. The smudge filter has to output the whole file content to stdout. But git-annex often has the file's content on disk already, and could just move it into place in the working tree. This would save CPU and IO and often disk space too. But the smudge interface doesn't let git-annex use the efficient approach. So I propose extending the filter driver with two more optional commands. Call them raw-clean and raw-smudge for now. raw-clean would be like clean, but rather than being fed the whole content of a large file on stdin, it would be passed the filename, and can access the file itself. Like the clean filter, it outputs the cleaned version on stdout. raw-smudge would be like smudge, but rather than needing to output the whole content of a large file on stdout, it would be passed a filename, and can create that file itself. To keep this backwards compatible, and to handle the cases where the object being filtered is not a file on disk, the smudge and clean filters would be required to be configured too, in order for raw-clean and raw-smudge to be used. It seems fairly easy to implement raw-clean. In sha1_file.c, index_path would use raw-clean when available, while index_fd etc keep on using the clean filter. I have not investigated what would be needed to implement raw-smudge yet. -- see shy jo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html