Re: Features from GitSurvey 2010

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 1, 2011 at 07:52, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
> On Tue, Feb 1, 2011 at 8:51 PM, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>> On Sun, 30 Jan 2011, Jonathan Nieder wrote:
>>> > support for tracking empty directories
>>>
>>> Tricky to get the UI right.  I am interested in and would be glad to
>>> help with this one.
>>
>> Also one needs to remember that this would require adding extension
>> to git index, because currently it tracks only files, and not
>> directories.  Explicitly tracking directories in the index could be
>> useful for other purposes...
>>
>> The major difficulty of this is IMHO not the UI, but tracking all those
>> tricky corner cases (like directory/file conflict, etc.).
>
> Sort order in index is quite special/strange and must be handled
> correctly when dirs and files are mixed.

Its not the order in the index that is confusing, its the order in the
tree objects.  The index sort order is simple, since every path is a
full path string from the top of the repository... you use strcmp() to
order them into a natural order.  This however skews where a
subdirectory should live relative to a sibling file, because the
"subdirectory" sorts as though its name ends with '/'.

> There are already special
> directories in index: the submodules. Current git code treats
> S_ISDIR() and S_ISGITLINK() the same in ce_to_dtype() and some more
> places. You need to decouple it somehow.

More confusingly, the GITLINK type is handled as though its *not* a
directory.  Storing an empty directory probably means tracking it like
a real directory, but using the empty tree SHA-1 as its value.
Otherwise we probably have all sorts of stuff broken.

> I tried this (for another purpose) and pulled back. I recall Shawn had
> a tree-based index implementation, don't know if he still has it.

No, we threw out the tree-based index that was used inside of EGit
years ago.  It turned out to be a horrible idea because it wasn't
compatible with the C tools, and it didn't have the inode stat cache
to tell us which files were clean or dirty quickly.

> Actually tree-based index with dictionary (something like trees in
> packv4) is a good feature itself. It could shrink index size down a
> lot. index is frequently read/written so small index helps (webkit's
> index is 16M, 4M after gzipped).

I think a lot of the reason the webkit index is 16M, gzip to 4M is
because of the duplicate path prefixes that appear on all files within
the same directory.  If the index was still a single file, but was
organized into sections by tree (like the TREE extension within the
index itself) you could avoid having the full path within the index
file and save a lot of space when there are many files within
subdirectories.  But this does complicate the C code because you would
need to copy each of those path segments together into a path buffer
in order to access the file in the working tree.

Its probably faster to copy those path segments on read into a big
path buffer, and break them apart on write, than to have a huge index
file.  We already reformat the index during reading/writing to expand
some of the fields for in-memory only flags.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]