Hi Janusz, On 2023-02-01 at 20:16:11 +0100, Janusz Krzysztofik wrote: > Hi Kamil, > > On Wednesday, 1 February 2023 19:21:57 CET Kamil Konieczny wrote: > > Hi Janusz, > > > > please send patches to igt ML and add other addresses to cc: > > I have one nit, see below. > > > > On 2023-01-31 at 10:17:31 +0100, Janusz Krzysztofik wrote: > > > Add a new subtest focused on exercising interaction between perf open / > > > close operations, which replace active barriers with perf requests, and > > > concurrent barrier preallocate / acquire operations performed during > > > context first pin / last unpin. > > > > > > References: https://gitlab.freedesktop.org/drm/intel/-/issues/6333 > > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx> > > > Cc: Chris Wilson <chris.p.wilson@xxxxxxxxxxxxxxx> > > > --- > > > tests/i915/perf.c | 41 +++++++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 39 insertions(+), 2 deletions(-) > > > > > > diff --git a/tests/i915/perf.c b/tests/i915/perf.c > > > index e33cacc443..11a3ec21ab 100644 > > > --- a/tests/i915/perf.c > > > +++ b/tests/i915/perf.c > > > @@ -39,6 +39,7 @@ > > > #include <math.h> > > > > > > #include "i915/gem.h" > > > +#include "i915/gem_create.h" > > > #include "i915/perf.h" > > > #include "igt.h" > > > #include "igt_perf.h" > > > @@ -4885,7 +4886,27 @@ test_whitelisted_registers_userspace_config(void) > > > i915_perf_remove_config(drm_fd, config_id); > > > } > > > > > > -static void test_open_race(const struct intel_execution_engine2 *e, int timeout) > > > +static void gem_exec_nop(int i915, const struct intel_execution_engine2 *e) > > > +{ > > > + const uint32_t bbe = MI_BATCH_BUFFER_END; > > > + struct drm_i915_gem_exec_object2 obj = { }; > > > + struct drm_i915_gem_execbuffer2 execbuf = { > > > + .buffers_ptr = to_user_pointer(&obj), > > > + .buffer_count = 1, > > > + }; > > > + > > > + obj.handle = gem_create(i915, 4096); > > > + gem_write(i915, obj.handle, 0, &bbe, sizeof(bbe)); > > > + > > > + execbuf.flags = e->flags; > > > + gem_execbuf(i915, &execbuf); > > > + > > > + gem_sync(i915, obj.handle); > > > + gem_close(i915, obj.handle); > > > +} > > > + > > > +static void test_open_race(const struct intel_execution_engine2 *e, int timeout, > > > + bool use_spin) > > -------------------------- ^ > > This is not the best way to develop new code, it may be good > > for fast development but imho it is better to refactor existing > > code and avoiding to add bool logic into function. > > I'm not sure what you mean. By refactoring, do you mean moving the code > common to both processing paths out to a separate helper, then call it from > two distinct functions, each implementing only one scenario? What's wrong > with passing flags that select one of alternative processing paths within one > function? Or are you just not happy with bool type? It is not wrong unless there is only one such var but even then readability of code is reduced. > > > It can be done later as this is revealing the bug. > > > > > { > > > int *done; > > > > > > @@ -4926,6 +4947,12 @@ static void test_open_race(const struct intel_execution_engine2 *e, int timeout) > > > ctx = gem_context_create_for_engine(i915, e->class, e->instance); > > > gem_context_set_persistence(i915, ctx, persistence); > > > > > > + if (!use_spin) { > > > + gem_exec_nop(i915, e); > > > + gem_context_destroy(i915, ctx); > > > + continue; > > > + } > > > + > > > spin = __igt_spin_new(i915, ctx, .ahnd = ahnd); > > > for (int i = random() % 7; i--; ) { > > > if (random() & 1) { > > > @@ -5330,7 +5357,17 @@ igt_main > > > for_each_physical_engine(drm_fd, e) > > > if (e->class == I915_ENGINE_CLASS_RENDER) > > > igt_dynamic_f("%s", e->name) > > > - test_open_race(e, 5); > > > + test_open_race(e, 5, true); > > > + } > > > + > > > > Please add here some TODO comments from discussions and a note > > which will help bug filling team to classify that. > > TODO about moving the subtest to a different test -- OK. But instructions for > Bug Filling team? Hmm, what if this subtest triggers a completely different > bug in the future? Are we going to update the comments then? Do Bug Filling > team members really read the sources while classifying bugs? > > Please have another look at an example of expected dmesg from the currently > triggered bug: > > <4> [192.634015] list_del corruption. prev->next should be ffff8881479d34e0, but was ffff888108af4ce0. (prev=ffff888127335160) > <4> [192.634032] WARNING: CPU: 1 PID: 1286 at lib/list_debug.c:59 __list_del_entry_valid+0xb7/0xe0 Ok, I see that it is kernel bug here. You are right, please add only TODO here. Regards, Kamil > <4> [192.634041] Modules linked in: vgem drm_shmem_helper fuse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio x86_pkg_temp_thermal coretemp i915 prime_numbers i2c_algo_bit kvm_intel ttm snd_hda_intel snd_intel_dspcfg kvm drm_buddy mei_pxp snd_hda_codec smsc75xx irqbypass mei_hdcp e1000e drm_display_helper usbnet snd_hwdep crct10dif_pclmul wmi_bmof mii crc32_pclmul ghash_clmulni_intel igc snd_hda_core drm_kms_helper ptp mei_me i2c_i801 syscopyarea snd_pcm pps_core sysfillrect i2c_smbus mei sysimgblt video intel_lpss_pci wmi > <4> [192.634156] CPU: 1 PID: 1286 Comm: perf Not tainted 6.2.0-rc6-CI_DRM_12672-gdf5c31e78609+ #1 > <4> [192.634161] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S SODIMM RVP, BIOS RKLSFWI1.R00.1435.A00.2010232019 10/23/2020 > <4> [192.634165] RIP: 0010:__list_del_entry_valid+0xb7/0xe0 > <4> [192.634170] Code: cc cc cc 48 89 ca 48 c7 c7 78 00 46 82 e8 0d 7e 61 00 0f 0b 31 c0 c3 cc cc cc cc 4c 89 c2 48 c7 c7 b0 00 46 82 e8 f5 7d 61 00 <0f> 0b 31 c0 c3 cc cc cc cc 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 f8 > <4> [192.634174] RSP: 0018:ffffc90001f2fcb0 EFLAGS: 00010082 > <4> [192.634180] RAX: 0000000000000000 RBX: ffff88812f8dec40 RCX: 0000000000000000 > <4> [192.634184] RDX: 0000000000000003 RSI: ffffffff823d7c88 RDI: 00000000ffffffff > <4> [192.634187] RBP: ffff8881479d34d8 R08: 0000000000000000 R09: ffffc90001f2fb60 > <4> [192.634191] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88812f8df440 > <4> [192.634194] R13: ffff88812f8df440 R14: ffff8881479d34e0 R15: 0000000000000246 > <4> [192.634197] FS: 00007fa38f9514c0(0000) GS:ffff888450080000(0000) knlGS:0000000000000000 > <4> [192.634201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4> [192.634205] CR2: 00007fa392c76000 CR3: 000000010809e003 CR4: 0000000000770ee0 > <4> [192.634208] PKRU: 55555554 > <4> [192.634211] Call Trace: > <4> [192.634214] <TASK> > <4> [192.634218] __i915_active_fence_set+0x71/0xf0 [i915] > <4> [192.634412] __i915_request_commit+0x2e1/0x5b0 [i915] > <4> [192.634597] i915_request_add+0xa6/0x330 [i915] > <4> [192.634783] gen8_modify_context+0xc2/0x220 [i915] > <4> [192.634958] oa_configure_all_contexts.isra.0+0x183/0x400 [i915] > <4> [192.635137] gen12_disable_metric_set+0x98/0x160 [i915] > <4> [192.635305] i915_oa_stream_destroy+0x3c/0x330 [i915] > <4> [192.635480] i915_perf_destroy_locked+0x22/0x80 [i915] > <4> [192.635646] i915_perf_release+0x35/0x50 [i915] > <4> [192.635808] __fput+0x95/0x260 > > Isn't such information from actual bug occurrence, contained in CI reports the > Bug Filling team is working with, more meaningful than any comment added to > the test source code? How can a test know in advance what bugs it will ever > trigger, and who exactly will be responsible for fixing them? > > Thanks, > Janusz > > > > > Regards, > > Kamil > > > > > + igt_describe("Exercise perf open/close against intensive barrier preallocate/acquire"); > > > + igt_subtest_with_dynamic("barrier-race") { > > > + const struct intel_execution_engine2 *e; > > > + > > > + for_each_physical_engine(drm_fd, e) > > > + if (e->class == I915_ENGINE_CLASS_RENDER) > > > + igt_dynamic_f("%s", e->name) > > > + test_open_race(e, 5, false); > > > } > > > > > > igt_fixture { > > > > > >