Re: Speeding up history traversals with caches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 25, 2017 at 05:28:43PM -0700, Sabelo Mhlambi wrote:

> Hi Jeff (and the Git community),
> 
> As my intro to open source contributions I'd like to attempt the "Speeding
> up history traversals with caches" as outlined here
> https://git.github.io/Outreachy-15/.
> 
> It seems like a challenging and worthwhile problem. May I have more
> information on the project and on how to get get started on the application.
> 
> Thanks!

Hi Sabelo, welcome to Git!

Unfortunately your message didn't make it to the mailing list, because
the list software is strict about messages not including any HTML parts.
It looks like you're using Gmail; you'll need to ask it to send
plain-text emails.

The general idea of the project is: a lot of git commands need to access
commit objects to walk the history graph, but they're expensive to
access because we have to inflate the whole commit object from disk.
What I'd like to have instead is a compact representation that we can
quickly use to get the main interesting data out of a commit message
without having to inflate all of the bytes.

I did a prototype of this a few years ago:

  https://public-inbox.org/git/20130129091434.GA6975@xxxxxxxxxxxxxxxxxxxxx/

Compared to those patches, there are a lot of possible things to work
on:

  - the code needs cleaned up and ported to a more modern git

  - the implementation is a bit complex; it was anticipating having
    several types of auxiliary files, but probably we really just need
    one

  - we've also discussed storing computed data about the graph, such as
    generation numbers, which can help speed up some traversals

  - we may be able to cache some interesting tree data (e.g., bitmaps of
    which paths are touched by a particular commit).

I wouldn't expect us to cover all of that during the internship period,
but it gives a sense of the possible directions.

That thread may work as a starting point for understanding the problem
space. You can also probably find some interesting discussions if you
search for "generation number" in the mailing list archive at
https://public-inbox.org/git.

The first step is probably to get comfortable with building Git and
submitting a small patch. Christian posted some advice on finding a
topic to work on:

  https://public-inbox.org/git/CAP8UFD3vPQHJZNt1+egKkshiyqrGKiJp7eWU-Es6bTLgvXe1Kg@xxxxxxxxxxxxxx/

Let us know if you get stuck or if you have any questions!

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux