On Wed, Aug 15, 2007 at 11:08:38AM -0400, Jon Smirl wrote: > On 8/15/07, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote: > > > Also... read_sha1_file() is currently not thread safe, and threaded > > > delta searching would requires that its usage be serialized with a > > > global mutex (not done in this patch which is a bug), or ideally be made > > > thread aware. > > You can avoid making all the low level calls thread safe by using the > main thread to get everything into RAM before starting to search for > the deltas. The worker threads would only deal with things completely > in memory. You may need to ref count these in-memory objects if they > are shared between worker threads. For simplicity the in-memory input > objects should be read only by the threads. The worker threads create > new structures to hand their results back to the main thread for > writing to disk. git-pack-objects knows the order, in which it will use the objects. A seperate thread could pre-read the next object and wait until the main thread starts processing it. After the read is complete, another thread could start computing the delta index. git-pack-objects currently reads an object (and computes the delta index), if it is really necessary. With the pre-read, unnecessary operations would happen. > Initially I would just ignore very large objects while getting the > basic code to work. After the basic code is working if a very large > object is encountered when the main thread is faulting objects in, the > main thread should just process this object on the spot using the > existing low memory code. I expect that the biggest gain will be for big objects, as they require more time to read+unpack the source objects and compute the delta index as well as the delta. > > @@ -1862,10 +1863,12 @@ int pretend_sha1_file(void *buf, unsigned long len, enum object_type type, > > void *read_sha1_file(const unsigned char *sha1, enum object_type *type, > > unsigned long *size) > > { > > + static pthread_mutex_t locky = PTHREAD_MUTEX_INITIALIZER; > > unsigned long mapsize; > > void *map, *buf; > > struct cached_object *co; > > > > + pthread_mutex_lock(&locky); > > co = find_cached_object(sha1); > > if (co) { > > buf = xmalloc(co->size + 1); > > @@ -1873,20 +1876,26 @@ void *read_sha1_file(const unsigned char *sha1, enum object_type *type, > > ((char*)buf)[co->size] = 0; > > *type = co->type; > > *size = co->size; > > + pthread_mutex_unlock(&locky); > > return buf; > > } > > > > buf = read_packed_sha1(sha1, type, size); Couldn't we release the mutex at this point? Why do we need to protect from concurrent access, when we are reading a loose object? > > - if (buf) > > + if (buf) { > > + pthread_mutex_unlock(&locky); > > return buf; > > + } > > map = map_sha1_file(sha1, &mapsize); > > if (map) { > > buf = unpack_sha1_file(map, mapsize, type, size, sha1); > > munmap(map, mapsize); > > + pthread_mutex_unlock(&locky); > > return buf; > > } > > reprepare_packed_git(); > > - return read_packed_sha1(sha1, type, size); > > + buf = read_packed_sha1(sha1, type, size); > > + pthread_mutex_unlock(&locky); > > + return buf; > > } mfg Martin Kögler - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html