On Fri, 13 Sep 2013, Nicolas Pitre wrote: > On Fri, 13 Sep 2013, Duy Nguyen wrote: > > > On Fri, Sep 13, 2013 at 8:27 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > > > However I can see that, as you say, the same base object is repeatedly > > > referenced. This means that we need to parse it over and over again > > > just to find the right offset where the needed entries start. We > > > probably end up skipping over the first entries in a tree object > > > multiple times. And that would be true even when the core code learns > > > to walk pv4 trees directly. > > > > > > So here's the beginning of a tree offset cache to mitigate this problem. > > > It is incomplete as the cache function is not implemented, etc. But > > > that should give you the idea. > > > > Thanks. I'll have a closer look and maybe complete your patch. > > I've committed an almost final patch and pushed it out. It is disabled > right now due to some bug I've not found yet. But you can have a look. Found the bug. The cache is currently updated by the caller. The caller may ask for a copy of 2 entries from a base object, but that base object may itself copy those objects from somewhere else in a larger chunk. Let's consider this example: tree A ------ 0 (0) copy 2 entries from tree B starting at entry 0 1 (2) copy 1 entry from tree B starting at entry 3 tree B ------ 0 (0) copy 6 entries from tree C starting at entry 0 1 (6) entry "foo.txt" 2 (7) entry "bar.txt" Right now, the code calls decode_entries() to decode 2 entries from tree B but those entries are part of a copy from tree C. When that call returns, the cache is updated as if tree B entry #2 would start at offset 1 but this is wrong because offset 0 in tree B covers 6 entries and therefore offset 1 is for entry #6. So this needs a rethink. I won't be able to work on this for a few days. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html