On Fri, Jun 19, 2015 at 06:10:10AM -0400, Jeff King wrote: > On Fri, Jun 19, 2015 at 10:10:59AM +0100, Charles Bailey wrote: > > > filter-objects is a command to scan all objects in the object database > > for the repository and print the ids of those which match the given > > criteria. > > > > The current supported criteria are object type and the minimum size of > > the object. > > > > The guiding use case is to scan repositories quickly for large objects > > which may cause performance issues for users. The list of objects can > > then be used to guide some future remediating action. > > I've had to perform this exact same task. You can already do the > "filtering" part pretty easily and efficiently with cat-file and a perl > script, like: > > magically_generate_all_objects | > git cat-file --batch-check='%(objectsize) %(objectname)' | > perl -alne 'print $F[1] if $F[0] > 1234' > > That's not as friendly as your filter-objects, but it's a lot more > flexible (since you can ask cat-file for all sorts of information). > > Obviously I've glossed over the "how to get a list of objects" part. > If you truly want all objects (not just reachable ones), or if "rev-list > --objects" is too slow [...] So, yes, performance is definitely an issue and I could have called this command "git magically-generate-all-object-for-scripts" but then, as it was so easy to provide exactly the filtering that I was looking for in the C code, I thought I would do that as well and then "filter-objects" ("filter-all-objects"?) seemed like a better name. It's about an order of magnitude faster on the systems I've checked to do a parameterless filter-objects then rev-list --all --objects, although I understand they do different things. I am also thinking about another piece that answers the question: "which commits introduce any of (or the first of) this list of objects?". This can be done by parseing a diff --raw for commits but I think it should be possible to do this faster, too. Charles. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html