Am 21.09.23 um 23:30 schrieb Alex Deucher:
On Thu, Sep 21, 2023 at 4:21 PM Rafał Miłecki <zajec5@xxxxxxxxx> wrote:
On 21.09.2023 21:52, Deucher, Alexander wrote:
backporting commit 187916e6ed9d ("drm/amdgpu: install stub fence into
potential unused fence pointers") to stable kernels resulted in lots of
WARNINGs on some devices. In my case I was getting 3 WARNINGs per
second (~150 lines logged every second). Commit ended up being reverted for
stable but it exposed a potential problem. My messages log size was reaching
gigabytes and was running my /tmp/ out of space.
Could someone take a look at amdgpu_sync_keep_later / dma_fence_is_later
and make sure its logging is rate limited to avoid such situations in the future,
please?
Revert in linux-5.15.x:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=li
nux-5.15.y&id=fae2d591f3cb31f722c7f065acf586830eab8c2a
openSUSE bug report:
https://bugzilla.opensuse.org/show_bug.cgi?id=1215523
These patches were never intended for stable. They were picked up by Sasha's stable autoselect tools and automatically applied to stable kernels.
Are you saying massive WARNINGs in dma_fence_is_later() can't happen
in any other case? I understand it was an incorrect backport action but
I thought we may learn from it and still add some rate limit.
All of the current places where that function is used check the
contexts before calling it so it should be safe as is in the tree.
That said, something like this could potentially happen again. I
don't think using WARN_ON_RATELIMIT() would be a problem.
Yeah, but it also shouldn't be necessary.
When this triggers you have a major driver bug at hand, spamming the
logs is then the least of your problems.
Christian.
Alex