On Wed, 24 Mar 2010, Shawn Pearce wrote: > On Wed, Mar 24, 2010 at 10:53 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > > On Wed, 24 Mar 2010, Fredrik Kuivinen wrote: > > > >> On Wed, Mar 24, 2010 at 00:50, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > >> > On Tue, 23 Mar 2010, Fredrik Kuivinen wrote: > >> > > >> >> On Tue, Mar 23, 2010 at 19:43, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote: > >> >> > If that is what we are doing, disabling the release of pack windows > >> >> > when malloc fails, why can't we do that all of the time? > >> >> > >> >> The idea was that most git programs are single threaded, so they can > >> >> still benefit from releasing the pack windows when they are low on > >> >> memory. > >> > > >> > This is bobus. The Git program using the most memory is probably > >> > pack-objects and it is threaded. Most single-threaded programs don't > >> > use close to as much memory. > >> > >> Ok, you are right. But xmalloc/xrealloc cannot be used in multiple > >> threads simultaneously without some serialization. > >> > >> For example, I think there are some potential race conditions in the > >> pack-objects code. In the threaded code we have the following call > >> chains leading to xcalloc, xmalloc, and xrealloc: > >> > >> find_deltas -> xcalloc > >> find_deltas -> do_compress -> xmalloc > >> find_deltas -> try_delta -> xrealloc > >> find_deltas -> try_delta -> read_sha1_file -> ... -> xmalloc (called > >> with read_lock held, but it can still race with the other calls) > >> > >> As far as I can see there is no serialization between these calls. > > > > True. We already have a problem. This is nasty. > > The easy solution is probably to remove the use of xmalloc from > find_deltas code path. But then we run into hard failures when we > can't get the memory we need, there isn't a way to recover from a > malloc() failure deep within read_sha1_file for example. The read_sha1_file path is not the problem -- it is always protected against concurrency with a mutex. It is more about do_compress() called on line 1476 of pack-objects.c for example. > The current solution is the best we can do, try to ditch pack windows > and hope that releases sufficient virtual memory space that a second > malloc() attempt can succeed by increasing heap. > > We could use a mutex during the malloc failure code-path of xmalloc, > to ensure only one thread goes through that pack window cleanup at a > time. But that will still mess with the main thread which doesn't > really want to acquire mutexes during object access as it uses the > existing pack windows. Right. > I thought pack-objects did all object access from the main thread and > only delta searches on the worker threads? No. Each thread is responsible for grabbing its own data set. > If that is true, maybe we > can have the worker threads signal the main thread on malloc failure > to release pack windows, and then wait for that signal to be > acknowledged before they attempt to retry the malloc. This means the > main thread would need to periodically test that condition as its > dispatching batches of objects to the workers. > > Ugly. Indeed. The real solution, of course, would be to have pack window manipulations protected by a mutex of its own. This, plus another mutex for the delta base cache, and then read_sha1_file() could almost be reentrant. Another solution could be for xmalloc() to use a function pointer for the method to use on malloc error path, which would default to a function calling release_pack_memory(size, -1). Then pack-objects.c would override the default with its own to acquire the read_mutex around the call to release_pack_memory(). That is probably the easiest solution for now. Nicolas