Re: Index format v5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/10/2012 02:19 PM, Thomas Gummerer wrote:
== Flat loading

Since internally git expects and works with lexicografic ordering,
a simple linear scan throught the subdirectories doesn't give
the right internal sorting. To achieve the right internal sorting
the loading will be done in the following way:

The data structure is a stack of queues, to allow continous reading
of the file.

s ->  queue1
t ->  queue2
a ->  queue3
c ->  queue4
k ->  queue5

dirs = read_all_directories

foreach dir in dirs do
     file = read_next_file

     while element_on_top_of_stack.first_element<  nextdir
         indexentries.append(dequeue(element_on_top_of_stack))
         if element_on_top_of_stack == emtpy:
             remove_element_on_top_of_stack

     if file[filename]<  nextdir
         indexentries.append(file)
     else
         queue.add(file)
         foreach f in rest_of_files_in_directory:
             queue.add(f)
         stack.push(queue)

foreach queue in stack:
     foreach entry in queue:
         indexentry.append(entry)

1. It seems to me that the first time that a file with a filename before nextdir is encountered, the reading of the directory containing the file will be terminated. This would, for example, be the case for any directory that contains multiple files but no subdirectories.

2. There is still a lot that is unnecessarily obscure. For example, I suppose (but you don't say) that "rest_of_files_in_directory" means to read the files at that moment. It would be more explicit (and no more verbose) to write

    while (f = read_next_file()) != NULL:
        queue.add(f)

3. You don't cover corner cases, like when read_next_file() is called but there are no files left in the directory, or when there is no nextdir (which itself is not defined). OK, this pseudocode is only meant to be illustrative, so I guess we can wait until your real implementation to see such details. On the other hand, you probably want to get all the details straight in pseudocode or Python before you start translating it into C.

4. I think the algorithm would be easier to understand and implement if it were written recursively. The call stack would replace your explicit stack (but you would still need one queue per directory level). A key observation is that when "nextdir" precedes the next file, then all of the files in subdirectories of nextdir do as well. Thus an obvious recursion would be to call a function like read_all_files_under_dir(indexentries, dirs, dirindex) at this point, which would consume all of the directories that are subdirectories of dirs[dirindex] (i.e., all directories whose names have the string dirs[dirindex] as a prefix). Using this method would mean that there is no need to compare each file under dirs[dirindex] against the next file in the outer directory.

Michael

--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]