Ben Peart <Ben.Peart@xxxxxxxxxxxxx> writes: > This is possible because the current extensions don't access the cache > entries in the index_state structure so are OK that they don't all exist > yet. > > The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED > extensions don't even get a pointer to the index so don't have access to the > cache entries. > > CACHE_EXT_LINK only uses the index_state to initialize the split index. > CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last > update and dirty flags. Good to see such an analysis here. Once we define an extension section, which requires us to have the cache entries before populating it, this scheme would falls down, of course, but the extension mechanism is all about protecting ourselves from the future changes, so we'd at least need a good feel for how we read an unknown extension from the future with the current code. Perhaps just like the main cache entries were pre-scanned to apportion them to worker threads, we can pre-scan the sections and compare them with a white-list built into our binary before deciding that it is safe to read them in parallel (and otherwise, we ask the last thread for reading extensions to wait until the workers that read the main index all return)? > -/* > -* A thread proc to run the load_cache_entries() computation > -* across multiple background threads. > -*/ This one was mis-indented (lacking SP before '*') but they are gone so ... ;-) > @@ -1978,6 +1975,36 @@ static void *load_cache_entries_thread(void *_data) > return NULL; > } > > +static void *load_index_extensions_thread(void *_data) > +{ > + struct load_cache_entries_thread_data *p = _data; > + unsigned long src_offset = p->start_offset; > + > + while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) { > + /* After an array of active_nr index entries, > + * there can be arbitrary number of extended > + * sections, each of which is prefixed with > + * extension name (4-byte) and section length > + * in 4-byte network byte order. > + */ > + uint32_t extsize; > + memcpy(&extsize, (char *)p->mmap + src_offset + 4, 4); > + extsize = ntohl(extsize); > + if (read_index_extension(p->istate, > + (const char *)p->mmap + src_offset, > + (char *)p->mmap + src_offset + 8, > + extsize) < 0) { Overly deep indentation. Used a wrong tab-width? > + /* allocate an extra thread for loading the index extensions */ > ce_per_thread = DIV_ROUND_UP(istate->cache_nr, nr_threads); > - data = xcalloc(nr_threads, sizeof(struct load_cache_entries_thread_data)); > + data = xcalloc(nr_threads + 1, sizeof(struct load_cache_entries_thread_data)); > > /* > * Loop through index entries starting a thread for every ce_per_thread > - * entries. Exit the loop when we've created the final thread (no need > - * to parse the remaining entries. > + * entries. > */ I see. Now the pre-parsing process needs to go through all the cache entries to find the beginning of the extensions section. > consumed = thread = 0; > - for (i = 0; ; i++) { > + for (i = 0; i < istate->cache_nr; i++) { > struct ondisk_cache_entry *ondisk; > const char *name; > unsigned int flags; > @@ -2055,9 +2082,7 @@ static unsigned long load_cache_entries(struct index_state *istate, > if (pthread_create(&p->pthread, NULL, load_cache_entries_thread, p)) > die("unable to create load_cache_entries_thread"); > > - /* exit the loop when we've created the last thread */ > - if (++thread == nr_threads) > - break; > + ++thread; This is not C++, and in (void) context, the codebase always prefers post-increment. > @@ -2086,7 +2111,18 @@ static unsigned long load_cache_entries(struct index_state *istate, > src_offset += (name - ((char *)ondisk)) + expand_name_field(previous_name, name); > } > > - for (i = 0; i < nr_threads; i++) { > + /* create a thread to load the index extensions */ > + struct load_cache_entries_thread_data *p = &data[thread]; This probably triggers decl-after-statement. > + p->istate = istate; > + mem_pool_init(&p->ce_mem_pool, 0); > + p->mmap = mmap; > + p->mmap_size = mmap_size; > + p->start_offset = src_offset; > + > + if (pthread_create(&p->pthread, NULL, load_index_extensions_thread, p)) > + die("unable to create load_index_extensions_thread"); > + > + for (i = 0; i < nr_threads + 1; i++) { > struct load_cache_entries_thread_data *p = data + i; > if (pthread_join(p->pthread, NULL)) > die("unable to join load_cache_entries_thread");