Re: diffcore-rename performance mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> There seems to be a serious performance problem in diffcore-rename.
> There is infrastructure to cache the "cnt_data" member of each filespec,
> but it never gets used because we immediately free the filespec data
> after use. Oops.
>
> With this patch:
> ...
> My 20-minute diff becomes a 2-minute diff. The downside is that the
> memory usage is much increased (for obvious reasons, it should increase
> by the dataset size, since we are keeping pointers to the data around --
> in my case, around 1G extra).

Yes, these early freeing of filespec_data() were introducd later
specifically to address the memory usage issue.

> However, keeping around _just_ the
> cnt_data caused only about 100M of extra memory consumption (and gave
> the same performance boost).

That would be an interesting and relatively low-hanging optimization.

> The spanhash data structure is a bit confusing. At first, it looked like
> we were doing a linear search for a matching hash, but it's not quite,
> since we seem to start at some magic spot based on the hashval we're
> looking up.

I think it was just a hash table with linear overflow (if your
spot is occupied by somebody else, you look for the next
available vacant spot -- works only if you do not ever delete
items from the table) but sorry, I do not recall the rationale
for picking that data structure.  I vaguely recall I did some
measurement between that and the usual "an array that is indexed
with a hash value that holds heads of linked lists" and pointer
chasing appeared quite cache-unfriendly to the point that it
actually degraded performance, but did not try very hard to
optimize it.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux