Re: [PATCH] git-p4: improve performance with large files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Sam Hocevar <sam@xxxxxxx>:

On Thu, Mar 05, 2009, thestar@xxxxxxxxxxxxxxxx wrote:

>   The current git-p4 way of concatenating strings performs in O(n^2)
>and is therefore terribly slow with large files because of unnecessary
>memory copies. The following patch makes the operation O(n).

The reason why it uses simple concatenation is to cut down on memory usage.
 - It is a tradeoff.

I think the modification you have made below is reasonable, however be
aware that memory usage could double, which substantially reduce the
size of the changesets that git-p4 would be able to import /at all/,
rather than to merely be slow.

   Uhm, no. The memory usage could be an additional X, where X is the
size of the biggest file in the commit. Remember that commit() stores
the complete commit data in memory before sending it to fast-import.
Also, on my machine the extra memory is already used because at some
point, "text += foo" calls realloc() anyway and often duplicates the
memory used by text.

You are correct - sorry, got confused as I somehow got mistaken that that form (foo += bar) can potentially be optimized, but it doesn't. That's what I get for not paying enough attention to the language used... :(

   The ideal solution is to use a generator and refactor the commit
handling as a stream. I am working on that but it involves deeper
changes, so as I am not sure it will be accepted, I'm providing the
attached compromise patch first. At least it solves the appaling speed
issue. I tuned it so that it never uses more than 32 MiB extra memory.

That is definitely the ideal solution - I would hope that it gets accepted if you manage to cut down on memory usage - as it really is a limitation for certain repositories. (Infact, this is why I no-longer use git-p4).


Signed-off-by: Sam Hocevar <sam@xxxxxxx>
---
 contrib/fast-import/git-p4 |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 3832f60..151ae1c 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -984,11 +984,19 @@ class P4Sync(Command):
         while j < len(filedata):
             stat = filedata[j]
             j += 1
+            data = []
             text = ''
while j < len(filedata) and filedata[j]['code'] in ('text', 'unicod
e', 'binary'):
-                text += filedata[j]['data']
+                data.append(filedata[j]['data'])
                 del filedata[j]['data']
+ # p4 sends 4k chunks, make sure we don't use more than 32 MiB
+                # of additional memory while rebuilding the file data.
+                if len(data) > 8192:
+                    text += ''.join(data)
+                    data = []
                 j += 1
+            text += ''.join(data)
+            del data

             if not stat.has_key('depotFile'):
                 sys.stderr.write("p4 print fails with: %s\n" % repr(stat))

--
Sam.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux