Re: git log with ordering option and --first-parent is unnecessarily slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 08, 2016 at 08:55:54PM -0700, Josiah Boning wrote:

> As measured on linux.git, adding --date-order to a log command can
> result in a significant slowdown (~25x here; I've seen ~100x on other
> repositories):
> 
>     $ time git log --first-parent --max-count=51 master > /dev/null
>     real    0m0.024s
>     user    0m0.006s
>     sys    0m0.016s
>     $ time git log --date-order --first-parent --max-count=51 master > /dev/null
>     real    0m0.652s
>     user    0m0.570s
>     sys    0m0.074s

Try timing "git log --first-parent master >/dev/null". It should be
about the same as the latter, which gives a hint about what is going on.

It's the "--max-count" which is interesting here. It is applied _after_
the sorting. So in the non-sorted case, git can stream out commits and
stop after printing 51. In the sorted case, we have to access every
commit to get its date (well, every commit on the first-parent path),
then sort them, and then return the first 51.

> In combination with --first-parent, --date-order (or any other
> ordering option) should be a no-op, since --first-parent selects a
> linear history. So it seems like there's a significant performance win
> available by ignoring ordering options when --first-parent is present.
> Would this change be desirable? If so, I'll see about submitting a
> patch.

I do agree that --date-order on a linear parent walk cannot change the
ordering (which guarantees child-before-parent), and is a noop.  But
note that not all first-parent invocations are strictly linear. For
example:

  git log --first-parent --date-order master next

which starts from two tips. We'd still want to order commits from the
two lists according to --date-order.

I suppose we could catch the single-tip, first-parent case and ignore
any ordering options which imply child-before-parent (which is currently
all of them). But I did not think too hard if there are any other corner
cases. This sounds like a case of "doctor, it hurts when I do this". Why
do you want to add --date-order in such a case, when it is a noop?

> More generally, it seems like it might be possible to identify when
> the selected commits are linear and avoid the cost of sorting.

It's not the cost of sorting. It's the cost of accessing the commits (if
you profile, you should see most time spent in zlib). So figuring out
that the case is linear will require roughly the same expense.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]