Quoting Sam Hocevar <sam@xxxxxxx>:
On Thu, Mar 05, 2009, thestar@xxxxxxxxxxxxxxxx wrote:
> The current git-p4 way of concatenating strings performs in O(n^2)
>and is therefore terribly slow with large files because of unnecessary
>memory copies. The following patch makes the operation O(n).
The reason why it uses simple concatenation is to cut down on memory usage.
- It is a tradeoff.
I think the modification you have made below is reasonable, however be
aware that memory usage could double, which substantially reduce the
size of the changesets that git-p4 would be able to import /at all/,
rather than to merely be slow.
Uhm, no. The memory usage could be an additional X, where X is the
size of the biggest file in the commit. Remember that commit() stores
the complete commit data in memory before sending it to fast-import.
Also, on my machine the extra memory is already used because at some
point, "text += foo" calls realloc() anyway and often duplicates the
memory used by text.
You are correct - sorry, got confused as I somehow got mistaken that
that form (foo += bar) can potentially be optimized, but it doesn't.
That's what I get for not paying enough attention to the language
used... :(
The ideal solution is to use a generator and refactor the commit
handling as a stream. I am working on that but it involves deeper
changes, so as I am not sure it will be accepted, I'm providing the
attached compromise patch first. At least it solves the appaling speed
issue. I tuned it so that it never uses more than 32 MiB extra memory.
That is definitely the ideal solution - I would hope that it gets
accepted if you manage to cut down on memory usage - as it really is a
limitation for certain repositories. (Infact, this is why I no-longer
use git-p4).
Signed-off-by: Sam Hocevar <sam@xxxxxxx>
---
contrib/fast-import/git-p4 | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 3832f60..151ae1c 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -984,11 +984,19 @@ class P4Sync(Command):
while j < len(filedata):
stat = filedata[j]
j += 1
+ data = []
text = ''
while j < len(filedata) and filedata[j]['code'] in
('text', 'unicod
e', 'binary'):
- text += filedata[j]['data']
+ data.append(filedata[j]['data'])
del filedata[j]['data']
+ # p4 sends 4k chunks, make sure we don't use more
than 32 MiB
+ # of additional memory while rebuilding the file data.
+ if len(data) > 8192:
+ text += ''.join(data)
+ data = []
j += 1
+ text += ''.join(data)
+ del data
if not stat.has_key('depotFile'):
sys.stderr.write("p4 print fails with: %s\n" % repr(stat))
--
Sam.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html