Re: [RFC \ WISH] Add -o option to git-rev-list

Andreas Ericsson <ae@xxxxxx> · Mon, 11 Dec 2006 14:40:55 +0100

Marco Costalba wrote:
On 12/11/06, Andreas Ericsson <ae@xxxxxx> wrote:
Marco Costalba wrote:
> On 12/10/06, Linus Torvalds <torvalds@xxxxxxxx> wrote:
>>
>> Why don't you use the pipe and standard read()?
>>
>> Even if you use "popen()" and get a "FILE *" back, you can still do
>>
>>         int fd = fileno(file);
>>
>> and use the raw IO capabilities.
>>
>> The thing is, temporary files can actually be faster under Linux just
>> because the Linux page-cache simply kicks ass. But it's not going 
to be
>> _that_ big of a difference, and you need all that crazy "wait for
>> rev-list
>> to finish" and the "clean up temp-file on errors" etc crap, so 
there's no
>> way it's a better solution.
>>
>
> Two things.
>
> - memory use: the next natural step with files is, instead of loading
> the file content in memory and *keep it there*, we could load one
> chunk at a time, index the chunk and discard. At the end we keep in
> memory only indexing info to quickly get to the data when needed, but
> the big part of data stay on the file.
>

memory usage vs speed tradeoff. Since qgit is a pure user-app, I think
it's safe to opt for the memory hungry option. If people run it on too
lowbie hardware they'll just have to make do with other ways of viewing
the DAG or shutting down some other programs.

> - This is probably my ignorance, but experimenting with popen() I
> found I could not know *when* git-rev-list ends because both feof()
> and ferror() give 0 after a fread() with git-rev-list already defunct.
> Not having a reference to the process (it is hidden behind popen() ),
> I had to check for 0 bytes read after a successful read (to avoid
> racing in case I ask the pipe before the first data it's ready) to
> know that job is finished and call pclose().
>

(coding in MUA, so highly untested)

Thanks Andreas, I will do some tests with your code. But at first
sight I fail to see (I'm not an expert on this tough ;-)  ) where is
the difference from using popen() and fileno() to get the file
descriptors.

read() vs fread(), so no libc buffers. When I did comparisons with this 
(a long time ago, I don't have the test-program around) in style of

	read(out[0], buf, sizeof(buf));
	write(fileno(stdout), buf, sizeof(buf));

with a command line like this;

	cat any-file | test-program > /dev/null

I saw a static ~10ms increase in execution time compared to

	cat any-file > /dev/null

regardless of the size of "any-file", so I assume this overhead comes 
from the extra fork(), which you'll never get rid of unless you use 
libgit.a.

--
Andreas Ericsson                   andreas.ericsson@xxxxxx
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html