GSOC proposal: Improving parallelism

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Goal:
---------
Improving parallelism in various git commands.

As the idea page says, only git grep seems to be using threads and
there are a lot of commands that can use threads.

It does seem from the codebase that grep uses pthreads and some
interface for using threads (for pack-objects) was started in
thread-utils. To achieve this goal, I will have to mess around with
the core git functionality. I think there are 2 ways to go ahead with
this project: 1. Adding pthread mutex's specific to each and every
functionality within its current implementation (much like what grep
does), or 2. Write a pthread interface (like thread-utils),
consolidate the main data structures that would need to have mutex's
around them and restructuring some of the code base.

I think 2 is a better option, but I would like to see the codebase a
little longer to understand how much time it would involve. As far as
the timeline for gsoc is concerned, I would start of with 2/3 simple
commands which could be easily modified. I would some help from my
mentor and this list to help me identify them. After I have this
working, I can proceed to parallelize other git functionalities.

Week 1-2: Getting familiar with the git codebase
Week 2-6: Work on adding basic functionality to thread-utils and
messing with git diff/log
Week 6-10: Recognizing additional parts of git which can be parallelized
Week 10-12: Documentation/Finalizing or just keep continuing with
additional git commands if everything is going fine

Success Criteria:
----------------------
I would identify commands whose efficiency can be improved by using
threads. And then measuring time metric for each of those commands on
several git repo datasets.

About Me:
-------------
I am a second year CS PhD student at Georgia Tech. I have previously
worked with OpenCV as a gsoc student. I wrote a tracking algorithm for
them. I mostly work with robots (like a huge-a** humanoid called Golem
Krang as you can see on my website). My experience with pthreads and
writing multi-threaded programs comes from developing IPC interfaces
for robots, because robot control has to be real time and it can be
unsafe to run robot control programs in a single process. I learnt
multithreaded programming from my systems courses at Georgia Tech like
Operating Systems and High Performance Computing.

I have used git for over 2 years now, as you can see from the github
account. I had ever looked into the git codebase only to mess with
gitweb while setting up gitolite on a server. But I am interested to
work with git because previous to this the most complex open source
project I had to work with was OpenCV. I realized they were more
interested in developing algorithms, and I want to dirty my hands in
systems programming now.

You can hit up the links below to find more about me. I would like to
hear from you if you think my approach is correct or could be
different.

Web: http://www.pushkar.name/
Github: https://github.com/pushkar (look at cshm, cshm-net projects,
shared mem ipc using circular buffers)
Resume: http://pushkar.name/resume.pdf

Pushkar Kolhe
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]