Quoting Matthew Auld (2019-09-30 10:58:15) > On 27/09/2019 21:42, Chris Wilson wrote: > > Quoting Matthew Auld (2019-09-27 18:33:57) > >> + i = 0; > >> + engines = i915_gem_context_lock_engines(ctx); > >> + do { > >> + u32 rng = prandom_u32_state(&prng); > >> + u32 dword = offset_in_page(rng) / 4; > >> + > >> + ce = engines->engines[order[i] % engines->num_engines]; > >> + i = (i + 1) % (count * count); > >> + if (!ce || !intel_engine_can_store_dword(ce->engine)) > >> + continue; > >> + > >> + err = igt_gpu_write_dw(ce, vma, dword, rng); > >> + if (err) > >> + break; > > > > Do you have a test that does > > dword, > > 64B or cacheline, > > page > > random width&strides of the above > > before doing the read back of a random dword from those? > > Are you thinking write_dw + increment(dword, qword, cl, ..), or actually > doing the fill: write_dw, write_qw, write_block? Here, I think stride is most interesting to hit various caching/transfer artifacts between the CPU and lmem (and possibly with writes to lmem). I think write_dw et al better stress the GPU write side and the instruction stream. > Or maybe both? I have been playing around with the write_dw + increment > for hugepages.c. Maybe both :) Never say no to more patterns! (Just be cautious of time budget and use the cycles wisely to maximise coverage of your mental model of the HW.) Once we get past the obvious coherency glitches in the driver, it gets far more subtle. It's easy enough to filter out the noise but deducing a pattern from gaps in the testing is much harder :) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx