Re: [PATCH] drm/i915: Stop doing writeback from the shrinker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/12/2021 14:46, Thomas Hellström wrote:
On Fri, 2021-12-10 at 11:05 +0000, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

This effectively removes writeback which was added in 2d6692e642e7
("drm/i915: Start writeback from the shrinker").

Digging through the history it seems we went back and forth on the
topic
of whether it would be safe a couple of times. See for instance
5537252b6b6d ("drm/i915: Invalidate our pages under memory pressure")
where Hugh Dickins has advised against it. I do not have enough
expertise
in the memory management area so am hoping for expert input here.

Reason for proposing removal is that there are reports from the field
which indicate a sysetm wide deadlock (of a sort) implicating i915
doing
writeback at shrinking time.

Signature is a hung task notifier kicking in and task traces such as:

It would be interesting to see what exactly the find_get_entry is
blocked on. The other two tasks are blocked on the shrinker_rwsem which
is held by i915. If it's indeed a deadlock with either of those two,

It may indeed be a livelock instead of a deadlock. I have received a newer trace and it indeed shows kswapd in running state. But no progress in 120s and dead machine sounded like too suspicious it could happen with just a gaming workload so I assumed a more serious issue than just severe memory pressure.

then the fix Chris is working on for an unrelated issue we discovered
with shrinking would move out the writeback call from the
shrinker_rwsem and resolve this, but if i915 is in turn deadlocking
with another process and these two are just hanging waiting for the
shrinker_rwsem, we would still have other issues.

Presumably this would involve an extra worker and tracking on a list or something?

Otherwise my main hope really was to get a verdict from memory management experts on pros & cons of doing writeback from the driver in any flavour.

Do you by any chance have the list of the locks held by the system at
this point?

No, but maybe Renato you could also collect "echo d" and "echo m" to sysrq-trigger when things go bad?

Regards,

Tvrtko



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux