fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]<

 



I begin to find a way that help me investigate fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] errors.
See https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/-/issues/339

I believe this affects mostly Fermi, Kepler and Maxwell1 graphic cards.

I'd like first to describe a bit how I proceed, then talk about separating this issue in many.

I am working on Gnome Debian Testing.
This environment react well (no freeze) when programs or XWayland are killed from the error.

I am using drm-misc, from: https://cgit.freedesktop.org/drm-misc/tree/
that I got with git clone git://cgit.freedesktop.org/drm-misc/tree/
and have compiled.

my kernel command line in /etc/grub/default have:
GRUB_CMDLINE_LINUX="pcie_aspm=off nouveau.debug=info nouveau.noaccel=0 drm.debug=0 log_buf_len=8M"
Not sure if only me need pcie_aspm=off to remove some AER errors on PCIe bus.

The first thing I do is:
su -
the - allows to have access to programs in /usr/sbin

dmesg --console-off
because I will generate a lot of messages, and I want them only in log files, not on console screen.

I launch Firefox, most of the bugs I get by browsing the web.

When ready to debug I do:
echo 255 > /sys/module/drm/parameters/debug

[At first was using 2, then 1 as suggested by /usr/sbin/modinfo drm, but then concluded 255 for all is the
 best to have all the cases that could cause the timeout]

I browse the web.

When Firefox stop, or everything goes away and return to the gdm (login screen), first thing I do is:
echo 0 > /sys/modules/drm/parameters/debug
to stop logging so much messages.

Then I do:
journalctl -b -g SCHED
to find at which second, the CTXSW_TIMEOUT message is.

Suppose it is at 08:21:14.
journalctl -b --since 08:21:13 --until 08:21:14
until the CTXSW_TIMEOUT is not the first line, I do it again with minus 1 sec on --since

Let's say I get up to: 08:21:09:
journalctl -b --since 08:21:09 --until 08:21:14 -o short-monotone > err.txt

cp err.txt /home/paul
mv /home/paul/err.txt /home/paul/journalctl_no1.txt
chown paul:paul /home/paul/journalctl_no1.txt

And then, as normal user paul:
gnome-text-editor journalctl_no1.txt &
and I search for: SCHED again...
and I looks the lines before to try to figure out the cause of the timeout.

If you take a look at: See https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/-/issues/339
you can see that what is before vary quite a bit each time.
I suspect there is many causes that can result in a MMU error on the GPU and so cause a timeout.

There is the possibility of a non-related memory corruption... I suppose.
But if not, it would make some sense to open a different issue for each different things happening before the timeout message.

Not sure, if is is really the good thing to do. So in part why I am writing this message to ask opinion(s).








[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux