Re: Funny error with git gc...

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 15 May 2009 21:08:27 +0200 (CEST)

Hi,

On Fri, 15 May 2009, Linus Torvalds wrote:

> On Fri, 15 May 2009, Junio C Hamano wrote:
> 
> > Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
> > 
> > > if you need a chuckle, like me, you might appreciate this story: in 
> > > one of my repositories, "git gc" dies with
> > >
> > > 	unable to open object pack directory: ...: Too many open files
> > >
> > > turns out that there are a whopping 1088 packs in that repository...
> > 
> > Isn't it a more serious problem than a mere chuckle?  How would one 
> > recover from such a situation (other than "mv .git/objects/pack-*; for 
> > p in pack-*.pack; do git unpack-objects <$p; done")?
> 
> Well, you can probably just increase the file limits and try again. 
> Depending on setup, you may need root to do so, though.
> 
> I also think you _should_ be able to avoid this by just limiting the 
> pack size usage. IOW, with some packed_git_limit, something like
> 
> 	[core]
> 		packedGitWindowSize = 16k
> 		packedGitLimit = 1M
> 
> you should hopefully be able to repack (slowly) even with a low file 
> descriptor limit, because of the total limit on the size.

I don't think so, because the window size has nothing to do with the 
amount of open windows, right?

> That said, I do agree that ulimit doesn't always work on all systems 
> (whether due to hard system limits or due to not having permission to 
> raise the limits), and playing games with pack limits is non-obvious. We 
> should really try to avoid getting into such a situation. But I think git 
> by default avoids it by the auto-gc, no? So you have to disable that 
> explicitly to get into this bad situation.

No, in this case, nothing was disabled.  auto-gc did not kick in, probably 
due to funny Git usage in hg2git.

> One solution - which I think may be the right one regardless - is to not 
> use "mmap()" for small packs or small SHA1 files.
> 
> mmap is great for random-access multi-use scenarios (and to avoid some 
> memory pressure by allowing sharing of pages), but for anything that is 
> just a couple of pages in size, mmap() just adds big overhead with 
> little upside.
> 
> So if we use malloc+read for small things, we'd probably avoid this. Now, 
> if you have a few thousand _large_ packs, you'd still be screwed, but the 
> most likely reason for having a thousand packfiles is that you did daily 
> "git pull"s, and have lots and lots of packs that are pretty small.
> 
> Dscho? What are your pack-file statistics in this case?

Mostly around 50kB.

But using malloc()+read() to avoid my use case sounds not 
straight-forward; it is rather a work-around than a proper solution.

For performance, I agree that malloc()+read() might be a sensible thing in 
a lot of cases.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html