Goal: --------- Improving parallelism in various git commands. As the idea page says, only git grep seems to be using threads and there are a lot of commands that can use threads. It does seem from the codebase that grep uses pthreads and some interface for using threads (for pack-objects) was started in thread-utils. To achieve this goal, I will have to mess around with the core git functionality. I think there are 2 ways to go ahead with this project: 1. Adding pthread mutex's specific to each and every functionality within its current implementation (much like what grep does), or 2. Write a pthread interface (like thread-utils), consolidate the main data structures that would need to have mutex's around them and restructuring some of the code base. I think 2 is a better option, but I would like to see the codebase a little longer to understand how much time it would involve. As far as the timeline for gsoc is concerned, I would start of with 2/3 simple commands which could be easily modified. I would some help from my mentor and this list to help me identify them. After I have this working, I can proceed to parallelize other git functionalities. Week 1-2: Getting familiar with the git codebase Week 2-6: Work on adding basic functionality to thread-utils and messing with git diff/log Week 6-10: Recognizing additional parts of git which can be parallelized Week 10-12: Documentation/Finalizing or just keep continuing with additional git commands if everything is going fine Success Criteria: ---------------------- I would identify commands whose efficiency can be improved by using threads. And then measuring time metric for each of those commands on several git repo datasets. About Me: ------------- I am a second year CS PhD student at Georgia Tech. I have previously worked with OpenCV as a gsoc student. I wrote a tracking algorithm for them. I mostly work with robots (like a huge-a** humanoid called Golem Krang as you can see on my website). My experience with pthreads and writing multi-threaded programs comes from developing IPC interfaces for robots, because robot control has to be real time and it can be unsafe to run robot control programs in a single process. I learnt multithreaded programming from my systems courses at Georgia Tech like Operating Systems and High Performance Computing. I have used git for over 2 years now, as you can see from the github account. I had ever looked into the git codebase only to mess with gitweb while setting up gitolite on a server. But I am interested to work with git because previous to this the most complex open source project I had to work with was OpenCV. I realized they were more interested in developing algorithms, and I want to dirty my hands in systems programming now. You can hit up the links below to find more about me. I would like to hear from you if you think my approach is correct or could be different. Web: http://www.pushkar.name/ Github: https://github.com/pushkar (look at cshm, cshm-net projects, shared mem ipc using circular buffers) Resume: http://pushkar.name/resume.pdf Pushkar Kolhe -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html