On Wed, Apr 23, 2008 at 01:46:20PM +0200, Johannes Sixt wrote: > > I assume you are wanting to do something like: > > > > git filter-branch --blob-filter ' > > case "$1" in > > *.jpg) cat ;; > > *) tr a-z A-Z ;; > > esac > > ' > > > > Obviously it is unlikely to get the same blob sha1 as "foo.jpg" and > > "foo.txt", but it just feels a little wrong. > > Yes, that's how I intended it to work. What's wrong here? The fact that a > user might name a JPEG foo.txt instead of foo.jpg? Or that the same blob > might appear with entirely different names, including different suffixes? > Well, tough luck. Use an index filter. But without any sort of hint what > the blob is about, your original --blob-filter is useless except for the > most simplistic repositories. Yes, the script produces incorrect results if you have the same blob with different names. IOW, if I accidentally add a JPEG as 'foo', and then later rename it to 'foo.jpg', it will munge the blob the first time it sees it, and then use the munged value for 'foo.jpg', since we never even run the case statement. Yes, this is not terribly likely, but it does seem like an awful (and hard to diagnose!) bug to have hiding in the script. The correct fix is either: - the blob cache needs to take into account sha1 _and_ path - the cache lookup needs to be _inside_ the path filter. In that case you would either have to support it in the script (e.g., --blob-ignore jpg), or you could make the caching an optional part of the blob filter (the way you can call 'map' explicitly from your filters). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html