On 15 Nov 2016, at 19:03, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Lars Schneider <larsxschneider@xxxxxxxxx> writes: > >>> The filter itself would need to be aware of parallelism >>> if it lives for multiple objects, right? >> >> Correct. This way Git doesn't need to deal with threading... > > I think you need to be careful about three things (at least; there > may be more): > > * Codepaths that check out multiple cache entries do rely on the > order of checkout. We checkout removals first to make room so > that creation of a path X can succeed if an existing path X/Y > that used to want to see X as a directory can succeed (see the > use of checkout_entry() by "git checkout", which does have two > separate loops to explicitly guarantee this), for example. I > think "remove all and then create" you do not specifically have > to worry about with the proposed change, but you may need to > inspect and verify there aren't other kind of order dependency. OK > * Done naively, it will lead to unmaintainable code, like this: > > + struct list_of_cache_entries *list = ...; > for (i = 0; i < active_nr; i++) > - checkout_entry(active_cache[i], state, NULL); > + if (checkout_entry(active_cache[i], state, NULL) == DELAYED) > + add_cache_to_queue(&list, active_cache[i]); > + while (list) { > + wait_for_checkout_to_finish(*list); > + list = list->next; > + } > > I do not think we want to see such a rewrite all over the > codepaths. It might be OK to add such a "these entries are known > to be delayed" list in struct checkout so that the above becomes > more like this: > > for (i = 0; i < active_nr; i++) > checkout_entry(active_cache[i], state, NULL); > + checkout_entry_finish(state); > > That is, addition of a single "some of the checkout_entry() calls > done so far might have been lazy, and I'll give them a chance to > clean up" might be palatable. Anything more than that on the > caller side is not. I haven't thought hard about the implementation, yet, but I'll try to stick to your suggestion and change as less code as possible on the caller sides. > * You'd need to rein in the maximum parallelism somehow, as you do > not want to see hundreds of competing filter processes starting > only to tell the main loop over an index with hundreds of entries > that they are delayed checkouts. I intend to implement this feature only for the new long running filter process protocol. OK with you? Thanks, Lars