Re: git-svnimport failed and now git-repack hates me

Linus Torvalds <torvalds@xxxxxxxx> · Thu, 4 Jan 2007 13:12:18 -0800 (PST)

On Thu, 4 Jan 2007, Chris Lee wrote:
> 
> Seems like *something* was definitely lost there. The 'used' number
> didn't go down at all when I started doing other things; it went up as
> the new programs started

The 'used' number basically _never_ goes down as long as there is memory 
free. The kernel simply doesn't have any reason to free any of its caches, 
even if those caches end up not being very useful.

What happened is almost certainly that with your big unpacked repository, 
the kernel ended up using a lot of memory on filename caching. In other 
words, I'd have expected that if you were to do 

	cat /proc/slabinfo

you'd have seen a _lot_ of memory being used for dentries ("dentry_cache") 
and inodes ("ext3_inode_cache" assuming you're an ext3 user).

The kernel can easily drop those caches on demand, but "free" isn't quite 
smart enough to know about them as being caches, so they will just show up 
as "used".

That said, since you didn't want them, dropping them by hand with sysctl 
certainly didn't hurt. Manual control can often be better than automatic 
heuristics..

So the reason why repacking is so useful is that it gets rid of all these 
millions of individual files. They all take up space on the disk, but they 
also do end up having a lot of caches associated with them.

Btw, you may find that despite your 4GB of RAM, you might still be 
better off with a swapfile. It gives the kernel a certain amount of 
freedom in choosing how to allocate memory, and perhaps more importantly, 
even when the kernel doesn't actively use it, it means that IF the kernel 
runs out of totally free memory (because it has decided to keep a lot of 
stuff in the dentry cache), it gives the kernel choices, and a certain 
"buffer" for making the right decision.

What often happens is that the memory management heuristics don't make the 
"perfect" choice (partly because it's theoretically impossible anyway, but 
largely just because it's just a damn hard problem to even get all that 
*close* to perfect), and having a swap partition or even a swap file just 
allows the kernel to make some mistakes without it hitting a hard wall of 
"oh, I can't do anything at all about this particular page".

So that buffer zone can be helpful in avoiding bad situations, but it can 
actually also end up improving performance - it doesn't sound like the 
case in this particular situation, but in some other loads there really 
are a lot of dirty pages that aren't all that useful and where the memory 
really could be better used for other things if the largely unused dirty 
page could just be written to disk.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html