fast-import's gfi_unpack_entry causes too many munmap/mmap cycles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I think I've mentioned this issue in the past, and fixed one case of
munmap/mmap cycle in fast-import. I've found more cases where this can
happen, and while on Linux, it's not a problem, on OSX, it leads to
catastrophic performance, when the import is massive.

One part of the problem is that gfi_unpack_entry just assumes that as
long as the pack was touched since last access, it _has_ to close all
windows and remap the pack. Afaict, when the requested entry is in the
range of what was previously mapped, there is no reason to do this, and
the following patch would help:

diff --git a/fast-import.c b/fast-import.c
index 75f7748..91fa2a1 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1294,7 +1294,8 @@ static void *gfi_unpack_entry(
 {
        enum object_type type;
        struct packed_git *p = all_packs[oe->pack_id];
-       if (p == pack_data && p->pack_size < (pack_size + 20)) {
+       if (p == pack_data && oe->idx.offset >= (p->pack_size - 20) &&
+           p->pack_size < (pack_size + 20)) {
                /* The object is stored in the packfile we are writing to
                 * and we have modified it since the last time we scanned
                 * back to read a previously written object.  If an old

Now, the problem is that this doesn't help with use cases like the
following:
- create commit A.
- get its tree with `ls :mark ""`

In that case, since the tree was (very likely) created by committing,
the entry is located beyond the open window for the pack.

There is some kind of work around for that particular use case: get the
tree *during* the commit, with `ls ""` after all the
filemodify/filedelete entries, and before the newline that finishes the
commit. Unfortunately, in that case, store_tree does its job twice (once
for the ls, and once when committing for real), and that leads to
significant overhead.

And even if I am okay with that overhead, I still hit the problem again
when using that tree later with `M 040000 $sha1 ` in the next commit,
because that does a load_tree() for the tree sha1, after a commit having
occurred, which touched the pack, bringing me back to square one (the
sad part being that it does so while the tree_content for the tree in
question is already in memory).

Now, while each individual case could be improved to avoid
gfi_unpack_entry, it seems to me it would be better to make
gfi_unpack_entry better somehow. I wonder if there's not a better way
than having it close and reopen windows all the time for recent objects.
Can't the pack be pre-allocated and mapped by window-size fragments, for
example? (Would that maybe not work properly on all platforms?)

Mike

PS: And if I could find a work around that does work with existing
releases of git, that would be great. The root of all my pain is that I
have tree deltas between consecutive commits, when the commits are not
necessarily parent/child. e.g. for a commit history like

A --- B
 \
  \-- C

I have:
- diff between null-tree and A
- diff between A and B
- diff between B and C

Which means for C, I *need* to create a commit without a `from`, and
initialize its tree with `M 040000 sha-1-of-B's-tree `.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]