Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)

Jeff King <peff@xxxxxxxx> · Thu, 24 May 2012 21:04:34 -0400

On Thu, May 24, 2012 at 06:54:56PM -0600, Martin Fick wrote:

> > Yes, exclusively warm. And all of the refs were packed,
> > which makes the warm/cold difference less interesting
> > (it's one 30MB or so file).  I don't think there's much
> > point in thinking about the performance of 400K loose
> > refs (which would be absolutely horrific cold-cache on
> > most traditional filesystems). If you have that many,
> > you would want to keep the bulk of them packed.
> 
> Mostly true, except for one strange case still I think?
> 
> When cloning a gerrit repo, users to not get the changes 
> since they are not under refs/heads but refs/changes.  So 
> later, if they choose to fetch refs/changes/*, all of those
> new incoming refs are loose.

Hmm. Yeah, clone will always write a packed-refs file, but I think "git
fetch" will always write loose refs, under the assumption that the
former will be getting a lot more refs than the latter. But of course
that is only a guess. It would be nice if fetch could fetch straight
into packed refs if we are getting more than N items.

We'd have to give some thought to potential race conditions, though.
Usually pack-refs isn't modifying the ref, so it can just write out the
value to the packed-refs file, then delete the loose ref if nobody has
touched it since we wrote. But here we're combining it with a
modification, so I suspect there would be a race with another process
trying to modify it.

> Yes, someone should pack those 
> refs right away, but I think it actually churns the hell out 
> of my disk and takes a significant amount of time during the 
> initial fetch.  I am not certain about this, and the 
> behavior may depend on the filesystem in use, but I think 
> that this time might even be asynchronous (journals and 
> all), it feels like my disk keeps churning for a while even 
> after this is over.  I believe that this might still be the 
> worst case left with refs, and it can be pretty bad,

Yeah, I wouldn't be surprised if this thrashes your disk. Writing
hundreds of thousands of 40-byte files is one of the most awful loads
for many filesystems, since each file gets its own inode. I haven't
tried btrfs, but my impression is that it can magically pack the data
from many files into one node.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html