Re: [PATCH 3/4] pack-objects: don't traverse objects unnecessarily

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 27, 2011 at 5:26 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Dan McGee <dpmcgee@xxxxxxxxx> writes:
>
>> Two optimizations take place here- we can start our objects array
>> iteration from a known point where we left off before we started trying
>> to find our tags,
>
> This I would understand (but I am somewhat curious how much last_untagged
> would advance relative to nr_objects for this half of the optimization to
> be worth it), but...
First off, sorry I wasn't able to respond to you until now, I've been
a bit swamped with other projects. I saw this made it into the 1.7.7.3
maint release today, but wanted to at least try to respond to the
points you raised (if you didn't investigate them by yourself
already).

If I remember right, the last_untagged optimization was pretty minor-
around 10% advancement of the pointer, never that much more. However,
it was a very easy change to make so I figured it was worth the slight
additional code (~5 lines changed for that part on its own).

>
>> and we don't need to do the deep dives required by
>> add_family_to_write_order() if the object has already been marked as
>> filled.
>
> I am not sure if this produces the identical result that was benchmarked
> in the original series.
I was not either when I wrote the patch, and I had hoped to confirm
the results you showed in the message of 1b4bb16b9ec. However, I was
unable to figure out how you generated those numbers so I wasn't able
to do so (and had planned to get back to you to find out how you made
those tables). Were you able to verify the ordering did not regress?

> For example, if you have a tagged object that is not a commit (say a
> blob), you would have written that blob in the second phase (write tagged
> objects together), so the family of blobs that share same delta parent as
> that blob will not be written in this "Finally all the rest" in the right
> place in the original list, no?
True, I think. They would not be written in the same place, but is
that necessarily the right place? The delta parent of an
already-filled tag object would eventually come up in the array as not
filled, and then we would do a full deep dive at that point, so the
majority of the objects would still be in close proximity.

Note that either way, we still have a gap between the original tagged
object and its "family"- potentially all the other tagged tips, tags,
commits, and trees before the blobs are finally hit and laid out in
family groups. So there is at least one potentially big seek that
can't be avoided.

> I do not think this change would forget to fill an object that needs to be
> filled, but it would affect the resulting ordering of the list, so...
This was the purpose of the added "if (wo_end != nr_objects) die()"
line; it confirms we have traversed and hit every object we originally
found.

-Dan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]