Re: [PATCH 3/3] Add filter-objects command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 19, 2015 at 06:10:10AM -0400, Jeff King wrote:
> On Fri, Jun 19, 2015 at 10:10:59AM +0100, Charles Bailey wrote:
> 
> > filter-objects is a command to scan all objects in the object database
> > for the repository and print the ids of those which match the given
> > criteria.
> > 
> > The current supported criteria are object type and the minimum size of
> > the object.
> > 
> > The guiding use case is to scan repositories quickly for large objects
> > which may cause performance issues for users. The list of objects can
> > then be used to guide some future remediating action.
> 
> I've had to perform this exact same task. You can already do the
> "filtering" part pretty easily and efficiently with cat-file and a perl
> script, like:
> 
>   magically_generate_all_objects |
>   git cat-file --batch-check='%(objectsize) %(objectname)' |
>   perl -alne 'print $F[1] if $F[0] > 1234'
> 
> That's not as friendly as your filter-objects, but it's a lot more
> flexible (since you can ask cat-file for all sorts of information).
> 
> Obviously I've glossed over the "how to get a list of objects" part.
> If you truly want all objects (not just reachable ones), or if "rev-list
> --objects" is too slow [...]

So, yes, performance is definitely an issue and I could have called this
command "git magically-generate-all-object-for-scripts" but then, as it
was so easy to provide exactly the filtering that I was looking for in
the C code, I thought I would do that as well and then "filter-objects"
("filter-all-objects"?) seemed like a better name.

It's about an order of magnitude faster on the systems I've checked to
do a parameterless filter-objects then rev-list --all --objects,
although I understand they do different things.

I am also thinking about another piece that answers the question: "which
commits introduce any of (or the first of) this list of objects?". This
can be done by parseing a diff --raw for commits but I think it should
be possible to do this faster, too.

Charles.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]