Re: [PATCH 3/3] Add filter-objects command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 19, 2015 at 11:33:24AM +0100, Charles Bailey wrote:

> > Obviously I've glossed over the "how to get a list of objects" part.
> > If you truly want all objects (not just reachable ones), or if "rev-list
> > --objects" is too slow [...]
> 
> So, yes, performance is definitely an issue and I could have called this
> command "git magically-generate-all-object-for-scripts" but then, as it
> was so easy to provide exactly the filtering that I was looking for in
> the C code, I thought I would do that as well and then "filter-objects"
> ("filter-all-objects"?) seemed like a better name.

Right, my point was only that it works for _your_ particular filter, but
it would be nice to have something more general. And we already have
"cat-file --batch-check". IOW, I think I would prefer the "magical" form
because it's a better scripting building block. As you note,
"filter-objects" without any filters is exactly that. Your 10 extra
lines of C code are not exactly bloat, but I just wonder if other people
will find it all that useful.

> It's about an order of magnitude faster on the systems I've checked to
> do a parameterless filter-objects then rev-list --all --objects,
> although I understand they do different things.

Right, it's the object-opening and hash lookups that kill you in
"rev-list", because it's actually walking the graph.

> I am also thinking about another piece that answers the question: "which
> commits introduce any of (or the first of) this list of objects?". This
> can be done by parseing a diff --raw for commits but I think it should
> be possible to do this faster, too.

If you care about "introduce", I think you have to traverse and do the
diffs. If you only care about "contains" (for example, because you want
to know which path the blob is found at), you can find trees which
mention it, then trees which mention that tree, and so on. I think that
ends up slower in practice, though.

I have patches that implement a "rev-list --find=$sha1", which sets a
bit on $sha1 and then traverses with --objects until we find it (or
them; you can specify multiple). It's pretty straightforward, but it
does cost as much as "git rev-list --objects" in the worst case. Let me
know if you're interested and I can clean it up and post it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]