On 28/09/18 20:41, Ben Peart wrote: > > > On 9/28/2018 1:07 PM, Junio C Hamano wrote: >> Ben Peart <peartben@xxxxxxxxx> writes: >> >>>> Why does multithreading have to be disabled in this test? >>> >>> If multi-threading is enabled, it will write out the IEOT extension >>> which changes the SHA and causes the test to fail. >> >> I think it is a design mistake to let the writing processes's >> capability decide what is written in the file to be read later by a >> different process, which possibly may have different capability. If >> you are not writing with multiple threads, it should not matter if >> that writer process is capable of and configured to spawn 8 threads >> if the process were reading the file---as it is not reading the file >> it is writing right now. >> >> I can understand if the design is to write IEOT only if the >> resulting index is expected to become large enough (above an >> arbitrary threshold like 100k entries) to matter. I also can >> understand if IEOT is omitted when the repository configuration says >> that no process is allowed to read the index with multi-threaded >> codepath in that repository. >> > > There are two different paths which determine how many blocks are written to the IEOT. The first is the default path. On this path, the number of blocks is determined by the number of cache entries divided by the THREAD_COST. If there are sufficient entries to make it faster to use threading, then it will automatically use enough blocks to optimize the performance of reading the entries across multiple threads. > > I currently cap the maximum number of blocks to be the number of cores that would be available to process them on that same machine purely as an optimization. The majority of the time, the index will be read from the same machine that it was written on so this works well. Before I added that logic, you would usually end up with more blocks than available threads which meant some threads had more to do than the other threads and resulted in worse performance. For example, 4 blocks across 3 threads results in the 1st thread having twice as much work to do as the other threads. > > If the index is copied to a machine with a different number of cores, it will still all work - it just may not be optimal for that machine. This is self correcting because as soon as the index is written out, it will be optimized for that machine. > > If the "automatically try to make it perform optimally" logic doesn't work for some reason, we have path #2. > > The second path is when the user specifies a specific number of blocks via the GIT_TEST_INDEX_THREADS=<n> environment variable or the index.threads=<n> config setting. If they ask for n blocks, they will get n blocks. This is the "I know what I'm doing and want to control the behavior" path. > > I just added one additional test (see patch below) to avoid a divide by zero bug and simplify things a bit. With this change, if there are fewer than two blocks, the IEOT extension is not written out as it isn't needed. The load would be single threaded anyway so there is no reason to write out a IEOT extensions that won't be used. > > > > diff --git a/read-cache.c b/read-cache.c > index f5d766088d..a1006fa824 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2751,18 +2751,23 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfil > e, > */ > if (!nr) { > ieot_blocks = istate->cache_nr / THREAD_COST; > - if (ieot_blocks < 1) > - ieot_blocks = 1; > cpus = online_cpus(); > if (ieot_blocks > cpus - 1) > ieot_blocks = cpus - 1; So, am I reading this correctly - you need cpus > 2 before an IEOT extension block is written out? OK. ATB, Ramsay Jones > } else { > ieot_blocks = nr; > } > - ieot = xcalloc(1, sizeof(struct index_entry_offset_table) > - + (ieot_blocks * sizeof(struct index_entry_offset))); > - ieot->nr = 0; > - ieot_work = DIV_ROUND_UP(entries, ieot_blocks); > + > + /* > + * no reason to write out the IEOT extension if we don't > + * have enough blocks to utilize multi-threading > + */ > + if (ieot_blocks > 1) { > + ieot = xcalloc(1, sizeof(struct index_entry_offset_table) > + + (ieot_blocks * sizeof(struct index_entry_offset))); > + ieot->nr = 0; > + ieot_work = DIV_ROUND_UP(entries, ieot_blocks); > + } > } > #endif > >