Re: Mozilla .git tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <junkio@xxxxxxx> wrote:
>  Step 3.  Work on integrating partial mmap() support with Shawn.
>           This is more or less orthogonal to 4GB ceiling (people
>           would hit mmap() limit even with a 1.5GB pack), but I
>           suspect it would be necessary to be able to tell where
>           the end of each pack entry is cheaply to implement
>           this.

I was just getting ready to move my partial mmap support over from
fast-import.

Although I did the implementation a little differently in fast-import
than what I think I'll do in core Git.  In fast-import store a
hashtable in memory of all objects in the pack but I chose not to
store the ending offset (or compressed length) and instead just
guess about where the object ends.  I did that to save 4 bytes of
memory per object. :-)

Its necessary to know where the object ends to ensure that your
current mapping (or any remapping you are about to do) covers the
entire object before you start deflating.  Otherwise you might
have to remap the pack in the middle of the inflate operation.
(Of course you might need to do this anyway if the compressed object
is larger than your default mapping unit.)

What I did in fast-import was give inflate whatever was left in
the current mapping; then if I got a Z_OK or Z_BUF_ERROR back from
inflate I move the mapping to the next 128 MiB chunk and reset my
z_stream's next_in/avail_in accordingly, then recall inflate.

No I didn't performance test it to see how frequently I'm mapping
a pack multiple times to get one object.  But I'm going to stick my
neck out and say that most objects probably don't have a compressed
length exceeding 128 MiB so we're talking one remap that we would
have had to do anyway if the object spanned over the end of the
current mapping.  If the object's starting offset was completely
outside of the current mapping then I rounded the offset down to
the page size (from getpagesize) and remapped; therefore we also
probably only do one remap on objects needing it.


But having the length or ending offset in the index will help with
copying the object during a repack as well as prevent us from needing
to guess during accesses.  So good news indeed that you are adding
it to the index.

-- 
Shawn.

-- 
VGER BF report: U 0.5
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]