Re: [PATCH] fast-import: export correctly marks larger than 2^20-1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Jonathan Nieder <jrnieder@xxxxxxxxx> writes:

> (+cc David, who I think mentioned wishing for something like this)
>
> Raja R Harinath wrote:
>
>> Subject: fast-import: export correctly marks larger than 2^20-1
>
> Thank you!  That would be a very good thing.

I needed this so that I could maintain two 'marks' counters, one for
commits counting up from 1, and one for files counting down from some
large value, where the file marks could be re-used.

  http://gitorious.org/~harinath/svn2git/rrh-svn2git/commit/ffc5270a6fa106fecad1a6a9f1520ca8f075c6b7

However, the 1M limit on marks is a bit too tight.  For instance, the
mono SVN repository has 160K commits, with one project alone using 60K
commits.  And one of the commits touched nearly 24K files.  The number
of marks used is uncomfortably close: within one order of magnitude of
the limit.  I'd like 10 more bits of breathing space, please :-)

>> dump_marks_helper() has a bug when dumping marks larger than 2^20-1,
>> i.e., when the sparse array has more than two levels.  The bug was
>> that the 'base' counter was being shifted by 20 bits at level 3, and
>> then again by 10 bits at level 2, rather than a total shift of 20 bits
>> in this argument to the recursive call:
>
> I haven’t read or grokked that code you are changing, so I can’t
> comment on the substance of your patch.  In case no one with such
> knowledge turns up, could you give a quick summary of what the
> existing code does and why?

This is not particular quick or clear, but here goes.

The marks are stored in a sparse array data structure.  The sparse array
is represented as a 1024-tree, with object storage only at the leafs,
and every path from root to leaf being the same length.  So, each node
can, and does, have the notion of a level, which is represented by a
number of bits to shift.

When you try to lookup at a non-leaf, the lookup index can be shifted
right the given number of bits, and masked to 10 bits [1], to get the
sub-tree to continue lookup in.  IOW, all the indices in a subtree have
a common prefix.

In dump_marks_helper, the 'base' argument contains that common prefix to
the current tree.  In the recursive case, the common prefix of the
sub-tree is computed by taking this 'base', and adding to it the
contribution from this node, i.e., k << shift [2].  The bug was that
'base' was being shifted too, even though it already had the correct
value from its caller.

- Hari

[1] The code actually does something different, but equivalent.  It
    masks off the top bits immediately before recursing.  However, it's
    easier to explain dump_marks_helper if we think of the index
    surviving to the leaf.

[2] Now that I think of it, the other fix, that I called more elegant,
    is harder to explain.  So, let's not go with that.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]