Re: RT is freezing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Unfortunately, the patch didn't work. But now I was able to get the stack (see below). This stack repeats more than 1500 times during 1 second.

[  139.532236] BUG: scheduling while atomic: Xorg/1273/0x00000002
[ 139.532252] Modules linked in: ctr ccm arc4 ath9k ath9k_common nouveau ath9k_hw bnep rfcomm ath snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec uvcvideo videobuf2_vmalloc snd_pcm videobuf2_memops videobuf2_core mxm_wmi videodev wmi snd_hwdep snd_seq_midi i2c_algo_bit drm_kms_helper snd_seq_midi_event ttm snd_rawmidi snd_seq drm intel_rapl btusb x86_pkg_temp_thermal snd_timer cfg80211 bluetooth snd_seq_device intel_powerclamp coretemp joydev parport_pc serio_raw crc32_pclmul snd ppdev 6lowpan_iphc lp parport mac_hid mei_me mei soundcore sony_laptop video lpc_ich psmouse firewire_ohci firewire_core r8169 ahci sdhci_pci libahci mii sdhci crc_itu_t [ 139.532253] CPU: 7 PID: 1273 Comm: Xorg Tainted: G W 3.14.25-rt22+ #17 [ 139.532254] Hardware name: Sony Corporation VPCF215FB/VAIO, BIOS R0200V3 02/10/2011 [ 139.532257] 00000000 00000000 e9d13c80 c1653d1b f77cbcc0 e9d13c98 c1650b4c c182aed0 [ 139.532259] c0529300 000004f9 00000002 e9d13d14 c165708c 0000001e e9d13cc4 c1650e91 [ 139.532262] e9d12000 7adb3ab0 00000020 c1a84cc0 c0528f20 c0528f20 e9d13cd8 c105abdb
[  139.532262] Call Trace:
[  139.532264]  [<c1653d1b>] dump_stack+0x48/0x76
[  139.532266]  [<c1650b4c>] __schedule_bug+0x54/0x62
[  139.532268]  [<c165708c>] __schedule+0x5dc/0x680
[  139.532270]  [<c1650e91>] ? printk+0x50/0x52
[  139.532273]  [<c105abdb>] ? print_oops_end_marker+0x3b/0x40
[  139.532275]  [<c105ac6f>] ? warn_slowpath_common+0x8f/0xa0
[  139.532278]  [<c16585be>] ? rt_mutex_slowlock+0x15e/0x1e0
[  139.532280]  [<c16585be>] ? rt_mutex_slowlock+0x15e/0x1e0
[  139.532282]  [<c165715b>] schedule+0x2b/0x90
[  139.532284]  [<c16585df>] rt_mutex_slowlock+0x17f/0x1e0
[  139.532287]  [<c1151fbd>] ? pagefault_disable+0xd/0x20
[  139.532290]  [<c1658662>] __ww_mutex_lock_interruptible+0x22/0x30
[ 139.532307] [<f8a3d33b>] nouveau_gem_ioctl_pushbuf+0x68b/0x11b0 [nouveau]
[  139.532309]  [<c1087953>] ? migrate_enable+0x83/0x190
[  139.532326]  [<f8a3ccb0>] ? nouveau_gem_ioctl_new+0x1d0/0x1d0 [nouveau]
[  139.532334]  [<f865b73e>] drm_ioctl+0x43e/0x4d0 [drm]
[  139.532351]  [<f8a3ccb0>] ? nouveau_gem_ioctl_new+0x1d0/0x1d0 [nouveau]
[  139.532354]  [<c1087953>] ? migrate_enable+0x83/0x190
[  139.532356]  [<c1426101>] ? __pm_runtime_resume+0x41/0x50
[  139.532373]  [<f8a34ea1>] nouveau_drm_ioctl+0x41/0x70 [nouveau]
[  139.532390]  [<f8a34e60>] ? nouveau_pmops_thaw+0x60/0x60 [nouveau]
[  139.532392]  [<c1196c92>] do_vfs_ioctl+0x2e2/0x4e0
[  139.532394]  [<c10bcb48>] ? ktime_get_ts+0x48/0x140
[  139.532397]  [<c1196ef0>] SyS_ioctl+0x60/0x90
[  139.532398]  [<c16609c6>] sysenter_do_call+0x12/0x12

On 01/07/2015 08:24 AM, Joakim Hernberg wrote:
On Mon, 05 Jan 2015 23:26:42 -0200
Gustavo Bittencourt <gbitten@xxxxxxxxx> wrote:

It seems that the problem is with the nouveau driver. When I boot in
failsafe graphic mode,  the system works well. Here is my video
configuration:
$ lshw -c video
    *-display
         description: VGA compatible controller
         product: GF108M [GeForce GT 540M]
         vendor: NVIDIA Corporation
         physical id: 0
         bus info: pci@0000:01:00.0
         version: a1
         width: 64 bits
         clock: 33MHz
         capabilities: pm msi pciexpress vga_controller bus_master
cap_list rom
         configuration: driver=nouveau latency=0
         resources: irq:53 memory:f4000000-f4ffffff
memory:d0000000-dfffffff memory:e0000000-e1ffffff
ioport:d000(size=128) memory:f5000000-f507ffff


On 01/05/2015 08:47 PM, Gustavo Bittencourt wrote:
Hi everybody

I compiled the 3.14.25-rt22, but my system freezes when I start
Unity and some programs like Chrome or Thunderbird. The problem
happens only when PREEMPT_RT_FULL=y. No log is generated. I would
like to find the root of this problem, but I don't know how. Do you
have any suggestion?
I don't know if this is related, and I'm sorry for mentioning nvidia on
the mailinglist, but if it applies to nouveau too, I hope it's
alright :)

I have the same experience using the nvidia driver on a test system.
This patch was brought to my attention and I use it for Archlinux'
realtime kernel.  It appears to fix the X hangs on my nvidia test
machine (note that for me it's just X that hangs):

-NOTE: this patch is a rebase of John Blackwood's patch. On his kernel, he must be using
-an older simple wait patch - as his applies to kernel/sched/core.c, while the simple wait
-completion code lives in kernel/sched/completion.c ... I have ported this to test with
-nvidia, as i would like to see if it fixes the semaphore issues i have seen.
-I've kept the original patch comment in tact; I'm not 100% sure that the patch below will fix your problem, but we
saw something that sounds pretty familiar to your issue involving the
nvidia driver and the preempt-rt patch.  The nvidia driver uses the
completion support to create their own driver's notion of an internally
used semaphore.
Fix a race in the PRT wait for completion simple wait code. A wait_for_completion() waiter task can be awoken by a task calling
complete(), but fail to consume the 'done' completion resource if it
looses a race with another task calling wait_for_completion() just as
it is waking up.
In this case, the awoken task will call schedule_timeout() again
without being in the simple wait queue.
So if the awoken task is unable to claim the 'done' completion resource,
check to see if it needs to be re-inserted into the wait list before
waiting again in schedule_timeout().
Fix-by: John Blackwood <john.blackwood@xxxxxxxx> --- linux-3.14/kernel/sched/completion.c 2014-05-22 14:01:03.879734869 -0400
+++ linux-3.14/kernel/sched/completion.c    2014-05-22 14:13:59.181688658 -0400
@@ -61,11 +61,19 @@
  do_wait_for_common(struct completion *x,
            long (*action)(long), long timeout, int state)
  {
+        int again = 0;
+
     if (!x->done) {
         DEFINE_SWAITER(wait);
swait_prepare_locked(&x->wait, &wait);
         do {
+                       /* Check to see if we lost race for 'done' and are
+                        * no longer in the wait list.
+                        */
+                       if (unlikely(again) && list_empty(&wait.node))
+                               swait_prepare_locked(&x->wait, &wait);
+
             if (signal_pending_state(state, current)) {
                 timeout = -ERESTARTSYS;
                 break;
@@ -74,6 +82,7 @@
             raw_spin_unlock_irq(&x->wait.lock);
             timeout = action(timeout);
             raw_spin_lock_irq(&x->wait.lock);
+                        again = 1;
         } while (!x->done && timeout);
         swait_finish_locked(&x->wait, &wait);
         if (!x->done)


--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux