Quoting Tvrtko Ursulin (2019-11-20 13:18:27) > > On 20/11/2019 12:59, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2019-11-20 12:55:42) > >> > >> On 20/11/2019 09:32, Chris Wilson wrote: > >>> Bonded request submission is designed to allow requests to execute in > >>> parallel as laid out by the user. If the master request is already > >>> finished before its bonded pair is submitted, the pair were not destined > >>> to run in parallel and we lose the information about the master engine > >>> to dictate selection of the secondary. If the second request was > >>> required to be run on a particular engine in a virtual set, that should > >>> have been specified, rather than left to the whims of a random > >>> unconnected requests! > >>> > >>> In the selftest, I made the mistake of not ensuring the master would > >>> overlap with its bonded pairs, meaning that it could indeed complete > >>> before we submitted the bonds. Those bonds were then free to select any > >>> available engine in their virtual set, and not the one expected by the > >>> test. > >> > >> There is a submit await which ensures master is not runnable before > >> bonded pairs are submitted. Why was that not enough? Are the sporadic > >> test failures? > > > > One test is using the submit_await, the other does not. It takes the > > background retire worker to run as we are submitting the secondaries... > > But I have not noticed this failure before hooking up retirement to > > process_csb. However, the issue is definitely present in the current > > test. > > So what happens? Is this another issue limited to selftests? Because I > don't see that uAPI itself can't be used in this way. Since the master batch is already completed & signaled by the time we submit the secondaries, the submit-fence is a dud and the secondaries are not constrained in their engine selection. i915_request_await_execution: if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) continue; else __i915_request_await_execution() Now, our choice is either to drop the check on the signaled bit (and so we will apply the bonding constrained from the already finished batch) or not. Given that the master is already complete, I feel justified in the current decision to ignore the constraint (since equally the fence could already have been retired and so completely inaccessible), so chose to fix the test instead. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx