[RFC][GSoC] Calling for comments regarding rough draft of proposal

tanay abhra <tanayabh@xxxxxxxxx> · Thu, 20 Mar 2014 01:22:28 +0530

Hi,
      I have already done the microproject, which has been merged into
main last week. I have prepared a rough draft of my proposal for
review, read all the previous mailing list threads about it.I am
reading the codebase little by little.

Please suggest improvements on the following topics,

1.I have read one-third of config.c and will complete reading it by
tomorrow.Is there any other  piece of code relevant to this proposal?

2.Other things I should add to the proposal that I have left off?I am
getting confused what extra details I should add to the proposal. I
will add
the informal parts(my background, schedule for summer etc) of the
proposal later.

3.Did I understand anything wrong or if my approach to solving
problems is incorrect,if yes, I will redraft my proposal according to
your suggestions.

------------------------------------------------------------------------------
#GSoC Proposal : Git configuration API improvements
-------------------------------------------------------------------------------

#Proposed Improvements

* Fix "git config --unset" to clean up detritus from sections that are
left empty.

* Read the configuration from files once and cache the results in an
appropriate data structure in memory.

* Change `git_config()` to iterate through the pre-read values in
memory rather than re-reading the configuration
  files.

* Add new API calls that allow the cache to be inquired easily and
efficiently.  Rewrite other functions like
 `git_config_int()` to be cache-aware.

* Rewrite callers to use the new API wherever possible.

* How to invalidate the cache correctly in the case that the
configuration is changed while `git` is executing.

#Future Improvements

*Allow configuration values to be unset via a config file

--------------------------------------------------------------------------
##Changing the git_config api to retrieve values from memory

Approach:-

We parse the config file once, storing the raw values to records in
memory. After the whole config has been read, iterate through the records,
feeding the surviving values into the callback in the order they were
originally read
(minus deletions).

Path to follow for the api conversion,

1. Convert the parser to read into an in-memory representation, but
   leave git_config() as a wrapper which iterates over it.

2. Add query functions like config_string_get() which will inquire
cache for values efficiently.

3. Convert callbacks to query functions one by one.

I propose two approaches for the format of the internal cache,

1.Using a hashmap to map keys to their values.This would bring as an
 advantage, constant time lookups for the values.The implementation
 will be similar to "dict" data structure in python,

 for example, section.subsection --mapped-to--> multi_value_string

 This approach loses the relative order of different config keys.

2.Another approach would be to actually represent the syntax tree of the
  config file in memory. That would make lookups of individual keys more
  expensive, but would enable other manipulation. E.g., if the syntax
  tree included nodes for comments and other non-semantic constructs, then
  we can use it for a complete rewrite.

 And "git config" becomes:

  1. Read the tree.

  2. Perform operations on the tree (add nodes, delete nodes, etc).

  3. Write out the tree.

and things like "remove the section header when the last item in the
section is removed" become trivial during step 2.

I still prefer the hashmap way of implementing the cache,as empty
section headers  are not so problematic(no processing pitfalls) and
are sometimes annotated with comments  which become redundant and
confusing if the section header is removed.As for the aesthetic
problem
I propose a different solution for it below.

----------------------------------------------------------------------
##Tidy configuration files

When a configuration file is repeatedly modified, often garbage is
left behind.  For example, after

    git config pull.rebase true
    git config --unset pull.rebase
    git config pull.rebase true
    git config --unset pull.rebase

the bottom of the configuration file is left with the useless lines

    [pull]
    [pull]

Also,setting a config value, appends the key-value pair at the end of
file without checking for empty main keys
even if the main key(like [my]) is already present and empty.It works
fine if the main key with an already present
sub-key.

for example:-
    git config pull.rebase true
    git config --unset pull.rebase
    git config pull.rebase true
    git config pull.option true
gives
    [pull]
    [pull]
        rebase = true
        option = true

Also, a possible detriment is presence of comments,
For Example:-
    [my]
            # This section is for my own private settings

Expected output:

  1. When we delete the last key in a section, we should be
     able to delete the section header.

  2. When we add a key into a section, we should be able to
     reuse the same section header, even if that section did
     not have any keys in it already.

Possible approaches:-

1.Leave the empty section header as it was and when a new value is set,
  reuse the header instead of appending at the end of the config file.
  I am going through the code to find find other solution for this problem.

links:-
[1]http://thread.gmane.org/gmane.comp.version-control.git/219505
[2]http://thread.gmane.org/gmane.comp.version-control.git/208113

-----------------------------------------------------------------------

Thanks,
Tanay Abhra.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html