Re: Watchman support for git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 9 May 2014, David Turner wrote:

On Fri, 2014-05-09 at 11:08 -0700, David Lang wrote:
On Fri, 9 May 2014, David Turner wrote:

On Fri, 2014-05-09 at 00:08 -0700, David Lang wrote:
On Thu, 8 May 2014, Sebastian Schuberth wrote:

On 03.05.2014 05:40, Felipe Contreras wrote:

That's very interesting. Do you get similar improvements when doing
something similar in Merurial (watchman vs . no watchman).

I have not tried it.  My understanding is that this is why Facebook
wrote Watchman and added support for it to Mercurial, so I would assume
that the improvements are at least this good.

Yeah, my bet is that they are actually much better (because Mercurial
can't be so optimized as Git).

I'm interested in this number because if watchman in Git is improving it
by 30%, but in Mercurial it's improving it by 100% (made up number),
therefore it makes sens that you might want it more if you are using hg,
but not so much if you are using git.

Also, if similar repositories with Mercurial+watchman are actually
faster than Git+watchman, that means that there's room for improvement
in your implementation. This is not a big issue at this point of the
process, just something nice to know.

The article at [1] has some details, they claim "For our repository, enabling Watchman integration has made Mercurial's status command more than 5x faster than Git's status command".

[1] https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/

a lot of that speed comparison is going to depend on your storage system and the
size of your repository.

if you have a high-end enterprise storage system that tracks metadata very
differently from the file contents (I've seen some that have rackes worth of
SATA drives for contents and then 'small' arrays of a few dozen flash drives for
the metadata), and then you have very large repositories (Facebook has
everything in a single repo), then you have a perfect storm where something like
watchman that talks the proprietary protocol of the storage array can be FAR
faster than anything that needs to operate with the standard POSIX calls.

That can easily account for the difference between the facebook announcement and
the results presented for normal disks that show an improvement, but with even
stock git being faster than improved mercurial.

As I recall from Facebook's presentation[1] on this (as well as from the
discussion on the git mailing list[2]), Facebook's test respository is
much larger than any known git repository.  In particular, it is larger
than WebKit.

agreed, it's huge, it's the entire codebase history of every tool that they use
crammed together in one rep

These performance improvements are not for server-side
tasks, but for client-side (e.g. git/hg status).  Facebook also made
other improvements for the client-server communication, and for
log/blame, but these are not relevant to watchman.

well, in their situation they have shared storage that clients use for this huge
repo, so I don't think they have a clear client/server boundry the way you are
thinking. Even clients have this huge repo to deal with, and they can do so
efficiently by querying the storage device rather than trying to walk the tree
or monitor access directly.

That's not my understanding from Durham Goode's talk in January.  Yes,
operations involving history go to the server.  But the client also
maintains a copy of the working tree, and it is for this that watchman
is used.  Otherwise, why bother with watchman at all?  The server knows
when it changes files and could simply maintain its own index of what's
changed.  Watchman is built on inotify/fsevents -- it doesn't have
anything to do with any sort of storage device beyond a vanilla hard
drive.

When you have such a massive repo, your clients aren't storing the data on their local drives, they are accessing the data on a network attached storage (via NFS or through a fuse mount). So they can have their watchman send queries to the storage server to find out what has changed in this massive repo rather than having to walk the directory tree (or try to monitor it for changes on the client machine)

David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]