On Oct 28, 2016 7:12 PM, "Linus Torvalds" <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> [ Chris, Kent, ignore the subject line, that was a mis-attribution of
> the cause ]
>
> On Fri, Oct 28, 2016 at 3:25 PM, Joseph Yasi <joe.yasi@xxxxxxxxx> wrote:
> >
> > I've been able to reproduce the issue with 19be0eaffa3ac7d8eb ("mm: remove
> > gup_flags FOLL_WRITE games from __get_user_pages()") reverted.
>
> Yeah, this doesn't look to have anything to do with that commit.
>
> > This smells like a race condition
> > somewhere. It's possible I just happened to never encounter that race
> > before.
>
> It looks like some seriously odd corruption. It's doing spin_lock()
> inside lockref_get_not_dead(), which is just a spinlock in the dentry.
> There's no way it should cause problems.
>
> The code disassembles to
>
> 0: 45 31 c9 xor %r9d,%r9d
> 3: 85 c0 test %eax,%eax
> 5: 74 44 je 0x4b
> 7: 48 89 c2 mov %rax,%rdx
> a: c1 e8 12 shr $0x12,%eax
> d: 48 c1 ea 0c shr $0xc,%rdx
> 11: 83 e8 01 sub $0x1,%eax
> 14: 83 e2 30 and $0x30,%edx
> 17: 48 98 cltq
> 19: 48 81 c2 c0 6e 01 00 add $0x16ec0,%rdx
> 20: 48 03 14 c5 a0 21 a7 add -0x5e58de60(,%rax,8),%rdx
> 27: a1
> 28:* 48 89 0a mov %rcx,(%rdx) <-- trapping instruction
> 2b: 8b 41 08 mov 0x8(%rcx),%eax
> 2e: 85 c0 test %eax,%eax
> 30: 75 09 jne 0x3b
> 32: f3 90 pause
> 34: 8b 41 08 mov 0x8(%rcx),%eax
> 37: 85 c0 test %eax,%eax
> 39: 74 f7 je 0x32
>
> where the beginning of that sequence is the "decode_tail() code, and I
> think the trapping instruction is the
>
> WRITE_ONCE(prev->next, node);
>
> so it's from kernel/locking/qspinlock.c:536:
>
> prev = decode_tail(old);
> /*
> * The above xchg_tail() is also a load of @lock which
> generates,
> * through decode_tail(), a pointer.
> *
> * The address dependency matches the RELEASE of xchg_tail()
> * such that the access to @prev must happen after.
> */
> smp_read_barrier_depends();
>
> WRITE_ONCE(prev->next, node);
>
> pv_wait_node(node, prev);
>
> and yes, %rdx (which should contain that pointer to 'prev') has that
> bogus pointer value 00007facb85592b6.
>
> So that's a core spinlock in the dentry being corrupted.
>
> Quite frankly, I'm somewhat suspicious of this:
>
> Modules linked in: pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O)
> vboxdrv(O) rfcomm bnep binfmt_misc vfat fat snd_hda_codec_hdmi
>
> ie those out-of-tree vbox modules..
>
> But for others, here's a cleaned-up copy of the oops in case somebody
> else sees something.
>
> BUG: unable to handle kernel paging request at 00007facb85592b6
> IP: queued_spin_lock_slowpath+0xe1/0x170
> PGD 7cee19067 PUD 0
> Oops: 0002 [#1] PREEMPT SMP
> Modules linked in: pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O)
> vboxdrv(O) rfcomm bnep binfmt_misc vfat fat snd_hda_codec_hdmi
> snd_hda_codec_realtek snd_hda_codec_generic uvcvideo
> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
> snd_usb_audio videodev snd_usbmidi_lib media snd_hda_intel
> snd_hda_codec snd_hwdep snd_hda_core snd_pcm_oss snd_mixer_oss
> snd_pcm input_leds intel_rapl x86_pkg_temp_thermal btusb
> intel_powerclamp crct10dif_pclmul btrtl btbcm efi_pstore
> crc32_pclmul btintel crc32c_intel bluetooth ghash_clmulni_intel
> aesni_intel aes_x86_64 snd_seq_oss lrw glue_helper ablk_helper
> cryptd intel_cstate snd_seq_midi snd_rawmidi intel_rapl_perf
> snd_seq_midi_event snd_seq efivars snd_seq_device snd_timer snd
> soundcore wl(PO) cfg80211 rfkill sg battery intel_lpss_acpi
> intel_lpss mfd_core acpi_pad tpm_tis acpi_als tpm_tis_core
> kfifo_buf tpm industrialio nfsd auth_rpcgss coretemp oid_registry
> nfs_acl lockd loop grace sunrpc efivarfs ipv6 crc_ccitt hid_generic
> usbhid uas usb_storage igb e1000e dca ptp mxm_wmi bcache psmouse
> i915 intel_gtt pps_core drm_kms_helper xhci_pci hwmon syscopyarea
> xhci_hcd sysfillrect sysimgblt i2c_algo_bit fb_sys_fops usbcore
> sr_mod drm cdrom i2c_core usb_common fan thermal
> pinctrl_sunrisepoint wmi video pinctrl_intel button
> CPU: 3 PID: 1139 Comm: lsof Tainted: P O 4.8.3-customskl #1
> Hardware name: System manufacturer System Product Name/Z170-DELUXE,
> BIOS 2202 09/19/2016
> task: ffff9e4a40062640 task.stack: ffff9e468ef80000
> RIP: 0010:[<ffffffffa1082731>] [<ffffffffa1082731>]
> queued_spin_lock_slowpath+0xe1/0x170
> RSP: 0018:ffff9e468ef83d00 EFLAGS: 00010202
> RAX: 0000000000001fff RBX: ffff9e494f7f2718 RCX: ffff9e4b2ecd6ec0
> RDX: 00007facb85592b6 RSI: 0000000080000000 RDI: ffff9e494f7f2718
> RBP: 0000000000000000 R08: 0000000000100000 R09: 0000000000000000
> R10: 0000000020ab886e R11: ffff9e494f7f26f8 R12: 0000000000000000
> R13: ffff9e494f7f26c0 R14: ffff9e468ef83d90 R15: 0000000000000000
> FS: 00007f3344595800(0000) GS:ffff9e4b2ecc0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007facb85592b6 CR3: 00000007daf09000 CR4: 00000000003406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> lockref_get_not_dead+0x3a/0x80
> unlazy_walk+0xee/0x180
> complete_walk+0x2e/0x70
> path_lookupat+0x93/0x100
> filename_lookup+0x99/0x150
> pipe_read+0x27e/0x340
> getname_flags+0x6a/0x1d0
> vfs_fstatat+0x44/0x90
> SYSC_newlstat+0x1d/0x40
> vfs_read+0x112/0x130
> SyS_read+0x3d/0x90
> entry_SYSCALL_64_fastpath+0x17/0x93
> Code: c1 e0 10 45 31 c9 85 c0 74 44 48 89 c2 c1 e8 12 48 c1 ea 0c 83
> e8 01 83 e2 30 48 98 48 81 c2 c0 6e 01 00 48 03 14 c5 a0 21 a7 a1 <48>
> 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b
> RIP [<ffffffffa1082731>] queued_spin_lock_slowpath+0xe1/0x170
> RSP <ffff9e468ef83d00>
> CR2: 00007facb85592b6
>
> > The /home partition in question is btrfs on bcache in writethrough mode. The
> > cache drive is an 180 GB Intel SATA SSD, and the backing device is two WD 3
> > TB SATA HDDs configured in MD RAID 10 f2 layout. / is btrfs on an NVMe SSD.
> >
> > I've also seen btrfs checksum errors in the kernel log when reproducing
> > this. Rebooting and running btrfs scrub finds nothing though so it seems
> > like in memory corruption.
>
> I'm adding Chris Mason and Kent Overstreet to the participants,
> because we did have a recent btrfs memory corruption thing. This
> corruption seems to be pretty widespread through, you migth also want
> to just run "memtest" on your machine.
>
> *Most* memory corruption tends to be due to software issues, but
> sometimes it really ends up being the memory itself going bad.
>
> But also, please test if this happens without the out-of-tree modules?
I was testing it without VirtualBox and broadcom-wl out-of-tree modules, and the machine locked up and won't POST anymore. The motherboard is claiming it's the CPU, so it looks like this was hardware. Sorry for the noise.
-Joe
>
> Kubys