[PATCH 0/5] drop "struct name_path" and path_name()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The graph traversal code in list-objects.c uses "struct name_path" to
build a linked list of path components, which it then feeds to the
callbacks. This is meant to be efficient, because we keep pointers into
the actual tree data for each name. However, there are two things that
work against this:

  1. In some cases, we keep in parallel a strbuf with the running
     pathname, so that we can feed it to tree_entry_interesting().

  2. The ultimate fate of this linked list is often to get concatenated
     into a single buffer anyway, via path_name().

So it's really not buying us much efficiency over just using
strbuf_addstr() and strbuf_setlen() in the first place. And it's extra
code that is slightly tricky to get right, especially with respect to
size_t and integer overflow (compare path_name() and
show_object_with_path() before and after).

This series drops the whole thing in favor of using a strbuf. Because I
wanted to make sure we weren't regressing performance, I measured two
cases before and after (and of course verified that they produce
identical output):

  1. "git rev-list --objects --all", which prints the name of each tree
     and blob we find directly from the linked list (without ever
     constructing the whole string), and does not use
     tree_entry_interesting (and so does not otherwise need to keep the
     running strbuf). So we'd see any negative effects of the strategy
     here.

  2. "git prune --dry-run", which walks the complete graph but does
     not do anything useful with the pathnames. So it would not
     otherwise need to assemble or look at the path components at all.

Both of them showed no measurable difference in their best-of-five times
when run on git.git. I didn't measure peak memory usage. For the reasons
explained in patch 3, it will actually be slightly _better_ for a normal
repo like git.git. But you could construct a pathological case where it
is worse (e.g., if you had a tree with a 500MB path-name, the old code
would need 500MB to run rev-list, and the new code will need 2*500MB
during the callback). I think the cleanup is worth it.

  [1/5]: http-push: stop using name_path
  [2/5]: show_object_with_name: simplify by using path_name()
  [3/5]: list-objects: convert name_path to a strbuf
  [4/5]: list-objects: drop name_path entirely
  [5/5]: list-objects: pass full pathname to callbacks

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]