Re: question about: Facebook makes Mercurial faster than Git

Ondřej Bílka <neleai@xxxxxxxxx> · Tue, 11 Mar 2014 15:23:25 +0100



On Mon, Mar 10, 2014 at 10:56:51AM -0700, David Lang wrote:
> On Mon, 10 Mar 2014, Ondřej Bílka wrote:
> 
> >On Mon, Mar 10, 2014 at 03:13:45AM -0700, David Lang wrote:
> >>On Mon, 10 Mar 2014, Dennis Luehring wrote:
> >>
> >>>according to these blog posts
> >>>
> >>>http://www.infoq.com/news/2014/01/facebook-scaling-hg
> >>>https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/
> >>>
> >>>mercurial "can" be faster then git
> >>>
> >>>but i don't found any reply from the git community if it is a real problem
> >>>or if there a ongoing (maybe git 2.0) changes to compete better in this case
> >>
> >>As I understand this, the biggest part of what happened is that
> >>Facebook made a tweak to mercurial so that when it needs to know
> >>what files have changed in their massive tree, their version asks
> >>their special storage array, while git would have to look at it
> >>through the filesystem interface (by doing stat calls on the
> >>directories and files to see if anything has changed)
> >>
> >That is mostly a kernel problem. Long ago there was proposed patch to
> >add a recursive mtime so you could check what subtrees changed. If
> >somebody ressurected that patch it would gave similar boost.
> 
> btrfs could actually implement this efficiently, but for a lot of
> other filesysems this could be very expensive. The question is if it
> could be enough of a win to make it a good choice for people who are
> doing a heavy git workload as opposed to more generic uses.
>
Read next paragraph how do that efficiently, a directory update needs to be done
only between application runs. Also there is no overhead when not used
(except if that makes headers bigger.)
 
> there's also the issue of managed vs generated files, if you update
> the mtime all the way up the tree because a source file was compiled
> and a binary created, that will quickly defeat the value of the
> recursive mtime.
>
You could do marking on per-file basis. I am not sure if that is needed
as larger projects use makefiles to not recompile everything so its
probably recompiled because source at same directory changed. Also if
your compile time is five minutes a half second status would not make
much difference.

 
> 
> >There are two issues that need to be handled, first if you are concerned
> >about one mtime change doing lot of updates a application needs to mark
> >all directories it is interested on, when we do update we unmark
> >directory and by that we update each directory at most once per
> >application run.
> >
> >Second problem were hard links where probably a best course is keep list
> >of these and stat them separately.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html