Hi Dave and Simona,
drm-xe-fixes for 6.12-rc5 with commits mostly improving error handling.
The g2h flush helps some LNL we are seeing, but we still have other 2
similar ones - however they didn't make it in time to drm-xe-next to be
properly tested, so I'm leaving for later.
There are 2 conflicts when merging drm-next on top that I fixed
in drm-tip: the first is trivial, just taking drm-next. The second is
also trivial, preferring xa_erase() over xa_erase_irq(), but the diff
context is more scary, so I'm pasting here (with a | prefix so bots
don't try anything funny):
| remerge CONFLICT (content): Merge conflict in drivers/gpu/drm/xe/xe_guc_ct.c
| index c6caf8f92421..c260d8840990 100644
| --- a/drivers/gpu/drm/xe/xe_guc_ct.c
| +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
| @@ -1019,7 +1019,6 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
| ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
|
| /*
| -<<<<<<< 3cf59b00bd34 (Merge remote-tracking branch 'drm-xe/drm-xe-fixes' into drm-tip)
| * Occasionally it is seen that the G2H worker starts running after a delay of more than
| * a second even after being queued and activated by the Linux workqueue subsystem. This
| * leads to G2H timeout error. The root cause of issue lies with scheduling latency of
| @@ -1044,22 +1043,10 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
| * correct ordering, and we lack the needed barriers.
| */
| mutex_lock(&ct->lock);
| - if (!ret) {
| - xe_gt_err(gt, "Timed out wait for G2H, fence %u, action %04x, done %s",
| - g2h_fence.seqno, action[0], str_yes_no(g2h_fence.done));
| - xa_erase_irq(&ct->fence_lookup, g2h_fence.seqno);
| -=======
| - * Ensure we serialize with completion side to prevent UAF with fence going out of scope on
| - * the stack, since we have no clue if it will fire after the timeout before we can erase
| - * from the xa. Also we have some dependent loads and stores below for which we need the
| - * correct ordering, and we lack the needed barriers.
| - */
| - mutex_lock(&ct->lock);
| if (!ret) {
| xe_gt_err(gt, "Timed out wait for G2H, fence %u, action %04x, done %s",
| g2h_fence.seqno, action[0], str_yes_no(g2h_fence.done));
| xa_erase(&ct->fence_lookup, g2h_fence.seqno);
| ->>>>>>> c9ff14d0339a (Merge tag 'drm-intel-gt-next-2024-10-23' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next)
thanks
Lucas De Marchi
drm-xe-fixes-2024-10-24-1:
Driver Changes:
- Increase invalidation timeout to avoid errors in some hosts (Shuicheng)
- Flush worker on timeout (Badal)
- Better handling for force wake failure (Shuicheng)
- Improve argument check on user fence creation (Nirmoy)
- Don't restart parallel queues multiple times on GT reset (Nirmoy)
The following changes since commit 42f7652d3eb527d03665b09edac47f85fb600924:
Linux 6.12-rc4 (2024-10-20 15:19:38 -0700)
are available in the Git repository at:
https://gitlab.freedesktop.org/drm/xe/kernel.git tags/drm-xe-fixes-2024-10-24-1
for you to fetch changes up to cdc21021f0351226a4845715564afd5dc50ed44b:
drm/xe: Don't restart parallel queues multiple times on GT reset (2024-10-24 12:42:52 -0500)
----------------------------------------------------------------
Driver Changes:
- Increase invalidation timeout to avoid errors in some hosts (Shuicheng)
- Flush worker on timeout (Badal)
- Better handling for force wake failure (Shuicheng)
- Improve argument check on user fence creation (Nirmoy)
- Don't restart parallel queues multiple times on GT reset (Nirmoy)
----------------------------------------------------------------
Badal Nilawar (1):
drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout
Nirmoy Das (2):
drm/xe/ufence: Prefetch ufence addr to catch bogus address
drm/xe: Don't restart parallel queues multiple times on GT reset
Shuicheng Lin (2):
drm/xe: Enlarge the invalidation timeout from 150 to 500
drm/xe: Handle unreliable MMIO reads during forcewake
drivers/gpu/drm/xe/xe_device.c | 2 +-
drivers/gpu/drm/xe/xe_force_wake.c | 12 +++++++++---
drivers/gpu/drm/xe/xe_guc_ct.c | 18 ++++++++++++++++++
drivers/gpu/drm/xe/xe_guc_submit.c | 14 ++++++++++++--
drivers/gpu/drm/xe/xe_sync.c | 3 ++-
5 files changed, 42 insertions(+), 7 deletions(-)