On 7/15/07, Alex Riesen <raa.lkml@xxxxxxxxx> wrote:
Marco Costalba, Sat, Jul 14, 2007 22:46:39 +0200: > Finding the delimiting '\0' it means to loop across the whole buffers > and _this_ is the expensive and not needed part. If just after the It is _not_ expensive. It could be made expensive, though. By using QString and QByteArray, for instance.
The searching we are talking about is this (Rev::indexData() in git_startup.cpp): int end = ba.indexOf('\0', idx); // this is the slowest find the starting point 'idx' is at the beginning of the log message. Qt implemantation of indexOf() is this (src/corelib/tools/qbytearray.cpp): int QByteArray::indexOf(char ch, int from) const { if (from < 0) from = qMax(from + d->size, 0); if (from < d->size) { const char *n = d->data + from - 1; const char *e = d->data + d->size; while (++n != e) if (*n == ch) return n - d->data; } return -1; } Hope this clears any doubts regarding (supposed) slowness of Qt classes.
> first line would be possible to point to the beginning of the next > revision this seeking for '\0' would be not necessary anymore. But this will make your reading different: you have to handle the case when the next revision is not _fully_ read in yet, but you already know its size.
Reading and creating revision is made as a streaming, it means that when there is new data from git a new Rev struct (well it's a class indeed, but there's no diference) is created and populated with index data: offset of the rev, parents number, offset of log message and so on. If, *while parsing the data* a truncated rev is found (we are at EOF and no '\0' is found) the whole rev is discarded and deleted, we wait for some more data and restart the process. Because the above event is quite rare given the size of the buffers where git row data is stored, no really loss of speed occurs and we have the (big) advantage of indexing *while* searching for '\0', so to scan data only once. This is how it works now. With the proposed patch will be easier to find a truncated rev, because as soon as we know the rev size, after reading it from the stream, we check: if (revision_offset + size > byte_array_size) truncated_rev;
P.S. BTW, why do you have some 20 source files marked executable in your qgit4 repository?
Importing from Windows: ntfs does not handles file attributes correctly, I should clean up permissions but I'm lazy ;-) Marco P.S: I have an experimental branch where the above is implemented, I cannot publish now because it requires the --show-size change in git, but after initial testing I have found that with the above applied the overhead of qgit on git-log it's about of only 16%. It means that if git-log runs in say 3 seconds (warm cache), qgit with the same git log arguments runs in about 3.5 seconds. With cold cache overhead is also less because disk access is accounted on the git side ;-) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html