Greetings, I'm applying for a Google Summer of Code project for git mirror-sync, mentioned on the Ideas wiki page. I've submitted my first draft to the GSoC website already, but if any kind mailing list members have any comments or questions I'd be happy to take them. Thanks go out to Sam Vilian, who already helped me with my app :) Cheers, Andrew ================================== Project Goals What is the goal of your project? The goal is to implement the MirrorSync protocol (http://code.google.com/p/gittorrent/wiki/MirrorSync and http://tinyurl.com/c7j3m7) as a continuation of previous Google Summer of Code projects involving peer to peer distribution of Git repositories to increase download speeds and decentralize distribution. It's a refinement of the GitTorrent protocol (http://gittorrent.utsl.gen.nz/rfc.html), which in turn was based on the popular BitTorrent protocol (http://www.bittorrent.org/beps/bep_0003.html). MirrorSync offers a simplified version of GitTorrent more tailored to the design of Git and without all the unnecessary BitTorrent cruft. How would you measure its success or failure? The MirrorSync overview goes over the three parts of functionality that need to be implemented. Completing all three would be highly desirable, but each part is useful in its own right since they add mirror downloading functionality. However, the scope of the project seems such that all three could be completed in a single Summer of Code project, and that's what I'll aim for. There's going to be a significant amount of work on formalizing the protocol and talking with Git developers, and that'll also be part of my work for the summer. Part one is Mirror List. This lets a client hit a repository and get a list of mirrors that the client could try downloading from, along with the most recent update to the repository and the signing key for verifying updates. Part two is Mirror Notify. This is how a client can tell a repository that the client has a copy of the repository, and is willing to act as a mirror. Part three is Mirror Sync. This is where the actual exchange of repository contents occurs. The peers start by comparing each other's latest packed-refs files, and then start requesting desired packs from each other. This means new changes will quickly propagate through the network, and since the objects are fragmented reproducibly, downloads can be spread across many peers. Describe your project in more detail. The necessary steps to implementing all of MirrorSync have already been laid out in skeleton form by Sam Vilain (http://tinyurl.com/crdq9f). The three messages will be implemented Mirror List first, followed by Mirror Notify and then Mirror Sync. These are major milestones for progress, and more formal documentation and specifications on each part should emerge along the way. My plan for all three parts is to write the bulk of the code in Python, since git-daemon can wrap the Python executables with execl_git_cmd(). This means there are going to be additions to git-daemon to accommodate this, but no big changes. Mirror List requires modifications to git-tag to create the necessary "push objects".These are tags containing a packed-refs file in the comments section, signed so a mirror can verify the state of the repository. The server has to maintain a list of mirrors and which signing key IDs are allowed to push back to which branches. The client has to make use of this key info to verify push objects, ensuring that the push objects are only changing the allowed refs. Adding keys to the keyring can be done through prompting similarly to how it's handled in SSH. Finally, git-fetch will be modified to have a "--use-mirror" flag to select from the list to download from a local mirror. Mirror Notify has to handle maintaining the list of mirrors and responding appropriately. The main repository should attempt to verify notify requests by checking that the repo exists on the mirror. Writing the client commands for notifying the server should be pretty easy. Mirror Sync is probably the hardest to implement, since it's the most complicated and requires the most investigation into Git internals. It makes use of Mirror List to get a list of peers, and then it needs to start advertising bundles it has and bundles it wants. Then the actual downloading and uploading process happens, and this keeps happening until everyone is in sync. The timeline for the three months would look something like this: one week of research and planning, three weeks for Mirror List, two weeks for Mirror Notify, five weeks for Mirror Sync, and one week for cleanups and documentation. This is a total of 12 weeks, the time allotted for Summer of Code. Interfaces What parts of Git will you need to call? There won't be much code reuse, since the MirrorSync additions are fairly separate from the rest of the Git codebase. The tag creation facilities will be used in making "push objects", and the existing framework in git-daemon for handling fetch requests will be extended to handle the new MirrorSync requests. What parts of Git might you need to change? There are two things that will be changed: git-tag and git-daemon. git-tag needs to be changed to accommodate creating the "push objects" necessary for peer communication, and the required additions will hopefully be minor. git-daemon needs to be modified to call out to the new mirror-* functionality when it's hit with a request. Besides that, three new commands are being added (mirror-list, mirror-notify, mirror-sync) so the Makefile will need to be modified. About You Can you list some prior projects that you have worked on? I haven't been involved in any open source projects much beyond bug reporting and triaging, for KDE and a few other projects. I have a couple very minor patches to Basket Note Pads in their Git tree, a KDE project written in Qt and C++ (http://basket.kde.org/). As with most open source projects, communication within Basket is primarily through email, and I've also been subscribed on and off to development mailing lists for KDE, Enlightenment, Gentoo, and now Git. I've done plenty of coding on my own projects though, the most relevant being Python implementations of a BitTorrent tracker and a threaded XML feed grabber and parser. Both of these were done on my own as self learning projects, using mainly the extensive standard Python libraries. Python is my goto language for projects big and small, and I'm pretty comfortable in it. Besides that, my professional background has been in websites, so I have exposure to all types of web development technologies and related programming languages. My resume (http://www.linkedin.com/in/aawang) is the best place to go for a run down on that part of my skill set. I've also been producing copious amounts of Java code for school related projects; right now we're doing development with Lego NXT robots with communication over Bluetooth with a WiiRemote. Do you have any prior Git experience? Have you started to get involved? I don't have any prior Git experience except as a user. I asked Sam Vilain a bit about MirrorSync when I saw the Git Ideas page, and now I'm on the developer mailing list. I've also started browsing through the source code, but I don't have any patches to my name right now. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html