Re: git-diff opens too many files?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 20 Nov 2006, Nguyen Thai Ngoc Duy wrote:
>
> I got this error in a quite big (in files) repository:
> error: open("vnexpress.net/Suc-khoe/2001/04/3B9AF976"): Too many open
> files in system

Ok, "too many open files in system" is ENFILE - you haven't run out of 
file descriptors in _one_ process, but you've exceeded the total number of 
file descriptors in the whole system.

Which is not because we forget to close() something, but because we're 
keeping file descriptors busy another way.

> fatal: cannot hash vnexpress.net/Suc-khoe/2001/04/3B9AF976

Hmm. We keep files mmap'ed in "git diff" for possibly too long. What 
happens is that we mmap a file that we want to diff when we start the 
whole thing, and keep it mapped over the whole diff session, because we're 
potentially going to need to compare it against other files (ie rename 
detection etc). And then we unmap it only at the end (in "diff_flush()" -> 
"diff_free_filepair()" -> "diff_free_filespec_data()").

And that's normally great, and means that we don't need to worry about the 
file data (we map it once, and can keep it in memory), but yeah, if you 
have thousands of files changed, you'll have thousands of mappings. And 
each one will have a pointer to a "struct file" inside the kernel. 

What OS/distro is this? Normally, you shouldn't have that low a limit on 
number of files open, but we do end up potentially opening thousands.

For example, under Linux, you can do this:

	# in one terminal window, do:
	while : ; do cat /proc/sys/fs/file-nr ; sleep 1; done

	# in another one:
	cd linux-repo
	git ls-files '*.c' | xargs touch
	git diff

and if it looks anything like mine, it could be:

	2464    0       349662
	2464    0       349662
	2464    0       349662
*	5920    0       349662
**	7616    0       349662
***	9024    0       349662
****	10944   0       349662
	2464    0       349662
	2464    0       349662


(see how the numnber of active files grows by thousands).

Anyway, there's two possible solutions:

 - simply make sure that you can have that many open files. 

   If it's a Linux system, just increase the value of the file
   /proc/sys/fs/file-max, and you're done. Of course, if you're not the 
   admin of the box, you may need to ask somebody else to do it for you..

 - we could try to make git not keep them mmap'ed for the whole time. 

Junio? This is your speciality, I'm not sure how painful it would be to 
unmap and remap on demand.. (or switch it to some kind of "keep the last 
<n> mmaps active" kind of thing to avoid having thousands and thousands of 
mmaps active).

One simple thing that might be worth it is to simply _not_ use mmap() at 
all for small files. If a file is less than 1kB, it might be better to do 
a malloc() and a read() - partly because it avoids having tons of file 
descriptors, but partly because it's also more efficient from a virtual 
memory usage perspective (not that you're probably very likely to ever 
really hit that problem in practice).

Nguyen - that "use malloc+read" thing might be a quick workaround, but 
only if you have tons of _small_ files (and if you can't easily just 
increase file-max). 

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]