Re: BUG: unable to handle kernel paging request in memset_erms (2)

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Fri, 19 Jan 2018 14:04:45 -0800

On Fri, 19 Jan 2018 13:58:01 -0800 syzbot <syzbot+29f08ad5cb6820798dfe@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

> Hello,
> 
> syzbot hit the following crash on mmots commit
> 2164355612187e55e8d60a28d2cc6b2337841a7e (Fri Jan 19 01:07:54 2018 +0000)
> pci: test for unexpectedly disabled bridges
> 
> So far this crash happened 2 times on mmots.
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+29f08ad5cb6820798dfe@xxxxxxxxxxxxxxxxxxxxxxxxx
> It will help syzbot understand when the bug is fixed. See footer for  
> details.
> If you forward the report, please keep this part and the footer.
> 
> BUG: unable to handle kernel paging request at ffffc90001691000
> IP: memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:65
> PGD 1dad2c067 P4D 1dad2c067 PUD 1dad2d067 PMD 1c6a8f067 PTE 0
> Oops: 0002 [#1] SMP KASAN
> Dumping ftrace buffer:
>     (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 5739 Comm: syzkaller592073 Not tainted 4.15.0-rc8-mm1+ #57
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> Google 01/01/2011
> RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:65
> RSP: 0018:ffff8801cbbdfb78 EFLAGS: 00010246
> RAX: fffff520002d3f00 RBX: ffffc90001691000 RCX: 000000000000ee51
> RDX: 000000000000ee51 RSI: 0000000000000000 RDI: ffffc90001691000
> RBP: ffff8801cbbdfb98 R08: fffff520002d3fcb R09: ffffc90001691000
> R10: 0000000000001dcb R11: fffff520002d3fca R12: 000000000000ee51
> R13: 0000000000000000 R14: 00007ffffffff000 R15: 000000002001be51
> FS:  00007f88ae7d7700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffc90001691000 CR3: 00000001ccefa005 CR4: 00000000001606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   memset include/linux/string.h:329 [inline]
>   _copy_from_user+0xe9/0x110 lib/usercopy.c:16
>   copy_from_user include/linux/uaccess.h:147 [inline]
>   snd_pcm_oss_write1 sound/core/oss/pcm_oss.c:1347 [inline]
>   snd_pcm_oss_write+0x438/0x880 sound/core/oss/pcm_oss.c:2659
>   __vfs_write+0xef/0x970 fs/read_write.c:480
>   vfs_write+0x189/0x510 fs/read_write.c:544
>   SYSC_write fs/read_write.c:589 [inline]
>   SyS_write+0xef/0x220 fs/read_write.c:581
>   entry_SYSCALL_64_fastpath+0x29/0xa0
> RIP: 0033:0x44a559
> RSP: 002b:00007f88ae7d6da8 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 00000000006dcc24 RCX: 000000000044a559
> RDX: 000000000000fe51 RSI: 000000002000c000 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 00000000006dcc20
> R13: 7073642f7665642f R14: 00800000c0045006 R15: 0000000000000001
> Code: 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3  
> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89  
> c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01
> RIP: memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:65 RSP: ffff8801cbbdfb78
> CR2: ffffc90001691000
> ---[ end trace 8f421641f3e10f44 ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>     (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..

It's hard to believe that the (four year old)
workaround-for-a-pci-restoring-bug.patch could cause this.



From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Subject: pci: test for unexpectedly disabled bridges

The all-ones value is not just a "device didn't exist" case, it's also
potentially a quite valid value, so not restoring it would be wrong.

What *would* be interesting is to hear where the bad values came from in
the first place.  It sounds like the device state is saved after the PCI
bus controller in front of the device has been crapped on, resulting in the
PCI config cycles never reaching the device at all.

Something along this patch (together with suspend/resume debugging output)
migth help pinpoint it.  But it really sounds like something totally
brokenly turned off the PCI bridge (some ACPI shutdown crud?  I wouldn't be
entirely surprised)

Cc: Greg KH <greg@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 drivers/pci/pci.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff -puN drivers/pci/pci.c~workaround-for-a-pci-restoring-bug drivers/pci/pci.c

--- a/drivers/pci/pci.c~workaround-for-a-pci-restoring-bug
+++ a/drivers/pci/pci.c
@@ -1094,6 +1094,15 @@ static void pci_restore_pcix_state(struc
 int pci_save_state(struct pci_dev *dev)
 {
 	int i;
+	u32 val;
+
+	/* Unable to read PCI device/manufacturer state? Something is seriously wrong! */
+	if (pci_read_config_dword(dev, 0, &val) || val == 0xffffffff) {
+		printk("Broken read from PCI device %s\n", pci_name(dev));
+		WARN_ON(1);
+		return -1;
+	}
+
 	/* XXX: 100% dword access ok here? */
 	for (i = 0; i < 16; i++)
 		pci_read_config_dword(dev, i * 4, &dev->saved_config_space[i]);
_

_______________________________________________
Alsa-devel mailing list
Alsa-devel@xxxxxxxxxxxxxxxx
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel