On Mon, 10 Mar 2014, Ondřej Bílka wrote:
On Mon, Mar 10, 2014 at 03:13:45AM -0700, David Lang wrote:
On Mon, 10 Mar 2014, Dennis Luehring wrote:
according to these blog posts
http://www.infoq.com/news/2014/01/facebook-scaling-hg
https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/
mercurial "can" be faster then git
but i don't found any reply from the git community if it is a real problem
or if there a ongoing (maybe git 2.0) changes to compete better in this case
As I understand this, the biggest part of what happened is that
Facebook made a tweak to mercurial so that when it needs to know
what files have changed in their massive tree, their version asks
their special storage array, while git would have to look at it
through the filesystem interface (by doing stat calls on the
directories and files to see if anything has changed)
That is mostly a kernel problem. Long ago there was proposed patch to
add a recursive mtime so you could check what subtrees changed. If
somebody ressurected that patch it would gave similar boost.
btrfs could actually implement this efficiently, but for a lot of other
filesysems this could be very expensive. The question is if it could be enough
of a win to make it a good choice for people who are doing a heavy git workload
as opposed to more generic uses.
there's also the issue of managed vs generated files, if you update the mtime
all the way up the tree because a source file was compiled and a binary created,
that will quickly defeat the value of the recursive mtime.
David Lang
There are two issues that need to be handled, first if you are concerned
about one mtime change doing lot of updates a application needs to mark
all directories it is interested on, when we do update we unmark
directory and by that we update each directory at most once per
application run.
Second problem were hard links where probably a best course is keep list
of these and stat them separately.