Re: [PATCH] Speedup recursive by flushing index only once for all entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 11 Jan 2007, Alex Riesen wrote:
> On 1/11/07, Linus Torvalds <torvalds@xxxxxxxx> wrote:
> > >
> > > Yep. Tried the monster merge on it: 1m15sec on that small laptop.
> > 
> > Is that supposed to be good? That still sounds really slow to me. What
> > kind of nasty project are you doing? Is this the 44k file project, and
> > under cygwin? Or is it that bad even under Linux?
> 
> It is that "bad" on a 384Mb linux laptop and 1.2GHz Celeron.
> Yes, it is that 44k files project. The previous code finishes
> that merge on that laptop in about 20 minutes, so it's defnitely
> an improvement. My cygwin machine has a lot more memory (2Gb),
> so I can't really compare them here.

Ok. Junio, I'd suggest putting it into 1.5.0, then - it's a fairly simple 
thing, after all, and if it's the difference between 20 minutes and just 
over one minute, it clearly matters.

With 384MB of memory, and 44 thousand files, I bet the problem is just 
that the working set doesn't fit entirely in RAM. It probably caches 
*most* of it, but with inodes and directories being spread out on disk 
(and I assume there are more files in the actual working tree), so writing 
out a 6MB index file (or whatever) and then reading it back several times 
just ends up generating IO simply because 6MB is actually a noticeable 
chunk of memory in that situation.

(It also generates a ton of tree objects early, so the effect at run-time 
is probably much more than 6MB).

That said, I think we actually have another problem entirely:

Look at "write_cache()", Junio: isn't it leaking memory like mad?

Shouldn't we have something like this?

It's entirely possible that the _real_ problem with the "flush the index 
all the time" was that it just caused this bug: tons and tons of lost 
memory, causing git-merge-recursive to grow explosively (~6MB per 
cache flush, and a _lot_ of cache flushes), which on a 384MB machine 
quickly uses up memory and causes totally unnecessary swapping.

Of course, it's also entirely possible that I'm a complete retard, and 
just didn't see where the data buffer is still used or freed.

"Linus - complete retard or hero in shining armor? You decide!"

		Linus

---
diff --git a/read-cache.c b/read-cache.c
index 8ecd826..c54a611 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1010,7 +1010,7 @@ int write_cache(int newfd, struct cache_entry **cache, int entries)
 		if (data &&
 		    !write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sz) &&
 		    !ce_write(&c, newfd, data, sz))
-			;
+			free(data);
 		else {
 			free(data);
 			return -1;
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]