Re: [PATCH] Add --show-size to git log to print message size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/15/07, Alex Riesen <raa.lkml@xxxxxxxxx> wrote:
Marco Costalba, Sat, Jul 14, 2007 22:46:39 +0200:
> Finding the delimiting '\0' it means to loop across the whole buffers
> and _this_ is the expensive and not needed part. If just after the

It is _not_ expensive. It could be made expensive, though. By using
QString and QByteArray, for instance.


The searching we are talking about is this (Rev::indexData() in
git_startup.cpp):

int end = ba.indexOf('\0', idx); // this is the slowest find

the starting point 'idx' is at the beginning of the log message.


Qt implemantation of indexOf() is this (src/corelib/tools/qbytearray.cpp):

int QByteArray::indexOf(char ch, int from) const
{
   if (from < 0)
       from = qMax(from + d->size, 0);
   if (from < d->size) {
       const char *n = d->data + from - 1;
       const char *e = d->data + d->size;
       while (++n != e)
       if (*n == ch)
           return  n - d->data;
   }
   return -1;
}

Hope this clears any doubts regarding (supposed) slowness of Qt classes.

> first line would be possible to point to the beginning of the next
> revision this seeking for '\0' would be not necessary anymore.

But this will make your reading different: you have to handle the case
when the next revision is not _fully_ read in yet, but you already
know its size.


Reading and creating revision is made as a streaming, it means that
when there is new data  from git a new Rev struct (well it's a class
indeed, but there's no diference) is created and populated with index
data: offset of the rev, parents number, offset of log message and so
on.

If, *while parsing the data* a truncated rev is found (we are at EOF
and no '\0' is found) the whole rev is discarded and deleted, we wait
for some more data and restart the process.

Because the above event is quite rare given the size of the buffers
where git row data is stored, no really loss of speed occurs and we
have the (big) advantage of indexing *while* searching for '\0', so to
scan data only once.

This is how it works now.

With the proposed patch will be easier to find a truncated rev,
because as soon as we know the rev size, after reading it from the
stream, we check:

            if (revision_offset + size > byte_array_size)
                      truncated_rev;



P.S. BTW, why do you have some 20 source files marked executable in
your qgit4 repository?


Importing from Windows: ntfs does not handles file attributes
correctly, I should clean up permissions but I'm lazy ;-)


Marco

P.S: I have an experimental branch where the above is implemented, I
cannot publish now because it requires the --show-size change in git,
but after initial testing I have found that with the above applied the
overhead of qgit on git-log it's about of only 16%.

It means that if git-log runs in say 3 seconds (warm cache), qgit with
the same git log arguments runs in about 3.5 seconds.

With cold cache overhead is also less because disk access is accounted
on the git side ;-)
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux