Re: [powerpc/powervm]Oops: Kernel access of bad area, sig: 11 [#1] while running stress-ng

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



cc linux-xfs

On Thu, Jul 19, 2018 at 11:47:59AM +0530, vrbagal1 wrote:
> On 2018-07-10 19:12, Michael Ellerman wrote:
> > vrbagal1 <vrbagal1@xxxxxxxxxxxxxxxxxx> writes:
> > 
> > > On 2018-07-10 13:37, Nicholas Piggin wrote:
> > > > On Tue, 10 Jul 2018 11:58:40 +0530
> > > > vrbagal1 <vrbagal1@xxxxxxxxxxxxxxxxxx> wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > Observing kernel oops on Power9(ZZ) box, running on PowerVM, while
> > > > > running stress-ng.
> > > > > 
> > > > > 
> > > > > Kernel: 4.18.0-rc4
> > > > > Machine: Power9 ZZ (PowerVM)
> > > > > Test: Stress-ng
> > > > > 
> > > > > Attached is .config file
> > > > > 
> > > > > Traces:
> > > > > 
> > > > >   [12251.245209] Oops: Kernel access of bad area, sig: 11 [#1]
> > > > 
> > > > Can you post the lines above this? Otherwise we don't know what
> > > > address
> > > > it tried to access (without decoding the instructions and
> > > > reconstructing
> > > > it from registers at least, which the XFS devs wouldn't be
> > > > inclined to
> > > > do).
> > > > 
> > > 
> > > ah my bad.
> > > 
> > >   [12251.245179] Unable to handle kernel paging request for data at
> > > address 0x6000000060000000
> > >   [12251.245199] Faulting instruction address: 0xc000000000319e2c
> > 
> > Which matches the regs & disassembly:
> > 
> > r4 = 6000000060000000
> > r9 = 0
> > ldx     r9,r4,r9	<- pop
> > 
> > So object was 0x6000000060000000.
> > 
> > That looks like two nops, ie. we got some code?
> > 
> > And there's only one caller of prefetch_freepointer() in
> > slab_alloc_node():
> > 
> > 	prefetch_freepointer(s, next_object);
> > 
> > 
> > So slab corruption is looking likely.
> > 
> > Do you have slub_debug=FZP on the kernel command line?
> 
> I tried to reproduce this, but didnt succeed. But instead I got xfs
> assertion.
> 
> Kernel: 4.18.0-rc5
> Tree Head: 30b06abfb92bfd5f9b63ea6a2ffb0bd905d1a6da
> 
> Traces:
> 
>  [12290.993612] XFS: Assertion failed: flags & XFS_BMAPI_COWFORK, file:
> fs/xfs/libxfs/xfs_bmap.c, line: 4370

This usually means we have a writeback of a page with delayed allocation
buffers over an extent map that reflects a hole in the file. This should
never happen for normal (non-reflink) data fork writes, hence the assert
failure and -EIO return.

We used to actually (accidentally) allocate a new block in this case,
but more recent kernels generate the error. More interestingly, I think
the pending iomap + writeback rework in XFS may simply drop this error
on the floor because we refer to extent state to process the page rather
than the other way around (and buffer_delay() isn't a state in the iomap
bits). This of course depends on which state is actually valid, the
buffer or extent map, which we don't know for sure.

Can you reliably reproduce this problem? If so, can you describe the
reproducer? Also, what is the filesystem geometry ('xfs_info <mnt>')?
And since this is a power kernel, I'm guessing you have 64k pages as
well..?

Brian

>  [12290.993668] ------------[ cut here ]------------
>  [12290.993672] kernel BUG at fs/xfs/xfs_message.c:102!
>  [12290.993676] Oops: Exception in kernel mode, sig: 5 [#1]
>  [12290.993678] LE SMP NR_CPUS=2048 NUMA pSeries
>  [12290.993697] Dumping ftrace buffer:
>  [12290.993730]    (ftrace buffer empty)
>  [12290.993735] Modules linked in: camellia_generic(E) cast6_generic(E)
> cast_common(E) ppp_generic(E) serpent_generic(E) slhc(E) kvm_pr(E) kvm(E)
> snd_seq(E) snd_seq_device(E) twofish_generic(E) snd_timer(E) snd(E)
> soundcore(E) twofish_common(E) lrw(E) cuse(E) tgr192(E) hci_vhci(E)
> vhost_net(E) bluetooth(E) ecdh_generic(E) wp512(E) rfkill(E)
> vfio_iommu_spapr_tce(E) vfio_spapr_eeh(E) vfio(E) rmd320(E) uhid(E) tun(E)
> tap(E) rmd256(E) rmd160(E) vhost_vsock(E) rmd128(E)
> vmw_vsock_virtio_transport_common(E) vsock(E) vhost(E) uinput(E) md4(E)
> unix_diag(E) dccp_ipv4(E) dccp(E) sha512_generic(E) binfmt_misc(E) fuse(E)
> vfat(E) fat(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E)
> raid6_pq(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) loop(E) ip6t_rpfilter(E)
> ipt_REJECT(E) nf_reject_ipv4(E) ip6t_REJECT(E)
>  [12290.993784]  nf_reject_ipv6(E) xt_conntrack(E) ip_set(E) nfnetlink(E)
> ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ip6table_nat(E)
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) nf_nat_ipv6(E) ip6table_mangle(E)
> ip6table_security(E) ip6table_raw(E) iptable_nat(E) nf_conntrack_ipv4(E)
> nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) iptable_mangle(E)
> iptable_security(E) iptable_raw(E) ebtable_filter(E) ebtables(E)
> ip6table_filter(E) ip6_tables(E) iptable_filter(E) xts(E) sg(E)
> vmx_crypto(E) pseries_rng(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E)
> ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) lpfc(E) mlx5_core(E)
> nvmet_fc(E) nvmet(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E)
> scsi_transport_fc(E) mlxfw(E) devlink(E) dm_mirror(E) dm_region_hash(E)
> dm_log(E) dm_mod(E) [last unloaded: torture]
>  [12290.993834] CPU: 2 PID: 433 Comm: kswapd1 Tainted: G            E
> 4.18.0-rc5-autotest-autotest #1
>  [12290.993838] NIP:  d00000000ebcfb8c LR: d00000000ebcfb64 CTR:
> c0000000005260a0
>  [12290.993841] REGS: c000000feee7ef40 TRAP: 0700   Tainted: G            E
> (4.18.0-rc5-autotest-autotest)
>  [12290.993845] MSR:  800000010282b033
> <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22224044  XER: 0000000d
>  [12290.993854] CFAR: d00000000ebcfb74 IRQMASK: 0
>  [12290.993854] GPR00: d00000000ebcfb64 c000000feee7f1c0 d00000000ec58700
> ffffffffffffffea
>  [12290.993854] GPR04: 000000000000000a c000000feee7f0c0 ffffffffffffffc0
> d00000000ec41620
>  [12290.993854] GPR08: ffffffffffffffca 0000000000000001 0000000000000000
> 0000000000000000
>  [12290.993854] GPR12: c0000000005260a0 c00000001ec9d600 c000000feee7f510
> 0000000040000000
>  [12290.993854] GPR16: 0000000020000000 0000000000000000 c000000fc6308000
> c000000feee7f4c0
>  [12290.993854] GPR20: c000000feee7f520 c000000feee7f520 000ffffffffe0000
> 0000000000000000
>  [12290.993854] GPR24: 00000000001fffff 0000000000000001 c000000f689c8800
> 0000000000000000
>  [12290.993854] GPR28: c000000f689c8848 c000000feee7f520 0000000000000001
> 0000000000000400
>  [12290.993907] NIP [d00000000ebcfb8c] assfail+0x5c/0x60 [xfs]
>  [12290.993921] LR [d00000000ebcfb64] assfail+0x34/0x60 [xfs]
>  [12290.993923] Call Trace:
>  [12290.993936] [c000000feee7f1c0] [d00000000ebcfb64] assfail+0x34/0x60
> [xfs] (unreliable)
>  [12290.993947] [c000000feee7f220] [d00000000eb542f0]
> xfs_bmapi_write+0x10f0/0x1260 [xfs]
>  [12290.993961] [c000000feee7f450] [d00000000ebc34e8]
> xfs_iomap_write_allocate+0x1b8/0x430 [xfs]
>  [12290.993975] [c000000feee7f5c0] [d00000000eba0de4]
> xfs_map_blocks+0x234/0x470 [xfs]
>  [12290.993988] [c000000feee7f630] [d00000000eba2d4c]
> xfs_do_writepage+0x27c/0x880 [xfs]
>  [12290.994001] [c000000feee7f710] [d00000000eba3314]
> xfs_do_writepage+0x844/0x880 [xfs]
>  [12290.994007] [c000000feee7f780] [c00000000029df24]
> pageout.isra.37+0x294/0x4a0
>  [12290.994011] [c000000feee7f860] [c0000000002a0a28]
> shrink_page_list+0x948/0x1090
>  [12290.994015] [c000000feee7f980] [c0000000002a1c8c]
> shrink_inactive_list+0x2ec/0x720
>  [12290.994019] [c000000feee7fa40] [c0000000002a28d8]
> shrink_node_memcg+0x238/0x7d0
>  [12290.994024] [c000000feee7fb40] [c0000000002a2f94]
> shrink_node+0x124/0x610
>  [12290.994027] [c000000feee7fc10] [c0000000002a46d0]
> balance_pgdat+0x200/0x460
>  [12290.994031] [c000000feee7fcf0] [c0000000002a4afc] kswapd+0x1cc/0x5f0
>  [12290.994036] [c000000feee7fdc0] [c00000000012ca6c] kthread+0x15c/0x1a0
>  [12290.994040] [c000000feee7fe30] [c00000000000b65c]
> ret_from_kernel_thread+0x5c/0x80
>  [12290.994043] Instruction dump:
>  [12290.994046] f821ffa1 4bfff8a9 3d220000 e929cb40 89290008 2f890000
> 40de0018 0fe00000
>  [12290.994053] 38210060 e8010010 7c0803a6 4e800020 <0fe00000> 3c4c0009
> 38428b70 7c0802a6
>  [12290.994061] ---[ end trace 84e4cce770c4d4e2 ]---
>  [12290.996233]
>  [12291.996258] Kernel panic - not syncing: Fatal exception
>  [12292.006592] Dumping ftrace buffer:
>  [12292.006637]    (ftrace buffer empty)
>  [12292.013222] WARNING: CPU: 2 PID: 433 at drivers/tty/vt/vt.c:3885
> do_unblank_screen+0x278/0x280
>  [12292.013228] Modules linked in: camellia_generic(E) cast6_generic(E)
> cast_common(E) ppp_generic(E) serpent_generic(E) slhc(E) kvm_pr(E) kvm(E)
> snd_seq(E) snd_seq_device(E) twofish_generic(E) snd_timer(E) snd(E)
> soundcore(E) twofish_common(E) lrw(E) cuse(E) tgr192(E) hci_vhci(E)
> vhost_net(E) bluetooth(E) ecdh_generic(E) wp512(E) rfkill(E)
> vfio_iommu_spapr_tce(E) vfio_spapr_eeh(E) vfio(E) rmd320(E) uhid(E) tun(E)
> tap(E) rmd256(E) rmd160(E) vhost_vsock(E) rmd128(E)
> vmw_vsock_virtio_transport_common(E) vsock(E) vhost(E) uinput(E) md4(E)
> unix_diag(E) dccp_ipv4(E) dccp(E) sha512_generic(E) binfmt_misc(E) fuse(E)
> vfat(E) fat(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E)
> raid6_pq(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) loop(E) ip6t_rpfilter(E)
> ipt_REJECT(E) nf_reject_ipv4(E) ip6t_REJECT(E)
>  [12292.013315]  nf_reject_ipv6(E) xt_conntrack(E) ip_set(E) nfnetlink(E)
> ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ip6table_nat(E)
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) nf_nat_ipv6(E) ip6table_mangle(E)
> ip6table_security(E) ip6table_raw(E) iptable_nat(E) nf_conntrack_ipv4(E)
> nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) iptable_mangle(E)
> iptable_security(E) iptable_raw(E) ebtable_filter(E) ebtables(E)
> ip6table_filter(E) ip6_tables(E) iptable_filter(E) xts(E) sg(E)
> vmx_crypto(E) pseries_rng(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E)
> ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) lpfc(E) mlx5_core(E)
> nvmet_fc(E) nvmet(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E)
> scsi_transport_fc(E) mlxfw(E) devlink(E) dm_mirror(E) dm_region_hash(E)
> dm_log(E) dm_mod(E) [last unloaded: torture]
>  [12292.013394] CPU: 2 PID: 433 Comm: kswapd1 Tainted: G      D     E
> 4.18.0-rc5-autotest-autotest #1
>  [12292.013400] NIP:  c0000000005e9cb8 LR: c0000000005e9a7c CTR:
> c00000000002bcc0
>  [12292.013406] REGS: c000000feee7e880 TRAP: 0700   Tainted: G      D     E
> (4.18.0-rc5-autotest-autotest)
>  [12292.013411] MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 28222042
> XER: 20040009
>  [12292.013422] CFAR: c0000000005e9a90 IRQMASK: 3
>  [12292.013422] GPR00: c0000000005e9c74 c000000feee7eb00 c000000001150200
> 0000000000000000
>  [12292.013422] GPR04: 0000000000000003 000000000000000b 00000000646f6320
> 3030303020726c20
>  [12292.013422] GPR08: 0000000000000000 0000000000000000 0000000000000000
> 3830303031303030
>  [12292.013422] GPR12: c00000000002bcc0 c00000001ec9d600 c000000feee7f510
> 0000000040000000
>  [12292.013422] GPR16: 0000000020000000 0000000000000000 c000000fc6308000
> c000000feee7f4c0
>  [12292.013422] GPR20: c000000feee7f520 c000000feee7f520 000ffffffffe0000
> 0000000000000000
>  [12292.013422] GPR24: 00000000001fffff 0000000000000001 c000000000ff9e60
> c0000000013067e8
>  [12292.013422] GPR28: c0000000013067c0 0000000000000000 c000000001414f60
> c0000000013067e8
>  [12292.013483] NIP [c0000000005e9cb8] do_unblank_screen+0x278/0x280
>  [12292.013488] LR [c0000000005e9a7c] do_unblank_screen+0x3c/0x280
>  [12292.013492] Call Trace:
>  [12292.013496] [c000000feee7eb00] [c0000000005e9c74]
> do_unblank_screen+0x234/0x280 (unreliable)
>  [12292.013505] [c000000feee7eb80] [c000000000513300]
> bust_spinlocks+0x40/0x80
>  [12292.013511] [c000000feee7eba0] [c0000000001003b0] panic+0x1b8/0x308
>  [12292.013518] [c000000feee7ec30] [c000000000023a3c] oops_end+0x1ec/0x1f0
>  [12292.013524] [c000000feee7ecb0] [c000000000023f54]
> _exception_pkey+0x1c4/0x1f0
>  [12292.013530] [c000000feee7ee50] [c000000000026020]
> program_check_exception+0x2c0/0x3e0
>  [12292.013537] [c000000feee7eed0] [c000000000008e94]
> program_check_common+0x134/0x140
>  [12292.013579] --- interrupt: 700 at assfail+0x5c/0x60 [xfs]
>  [12292.013579]     LR = assfail+0x34/0x60 [xfs]
>  [12292.013597] [c000000feee7f220] [d00000000eb542f0]
> xfs_bmapi_write+0x10f0/0x1260 [xfs]
>  [12292.013619] [c000000feee7f450] [d00000000ebc34e8]
> xfs_iomap_write_allocate+0x1b8/0x430 [xfs]
>  [12292.013641] [c000000feee7f5c0] [d00000000eba0de4]
> xfs_map_blocks+0x234/0x470 [xfs]
>  [12292.013662] [c000000feee7f630] [d00000000eba2d4c]
> xfs_do_writepage+0x27c/0x880 [xfs]
>  [12292.013682] [c000000feee7f710] [d00000000eba3314]
> xfs_do_writepage+0x844/0x880 [xfs]
>  [12292.013689] [c000000feee7f780] [c00000000029df24]
> pageout.isra.37+0x294/0x4a0
>  [12292.013696] [c000000feee7f860] [c0000000002a0a28]
> shrink_page_list+0x948/0x1090
>  [12292.013702] [c000000feee7f980] [c0000000002a1c8c]
> shrink_inactive_list+0x2ec/0x720
>  [12292.013708] [c000000feee7fa40] [c0000000002a28d8]
> shrink_node_memcg+0x238/0x7d0
>  [12292.013714] [c000000feee7fb40] [c0000000002a2f94]
> shrink_node+0x124/0x610
>  [12292.013720] [c000000feee7fc10] [c0000000002a46d0]
> balance_pgdat+0x200/0x460
>  [12292.013726] [c000000feee7fcf0] [c0000000002a4afc] kswapd+0x1cc/0x5f0
>  [12292.013732] [c000000feee7fdc0] [c00000000012ca6c] kthread+0x15c/0x1a0
>  [12292.013738] [c000000feee7fe30] [c00000000000b65c]
> ret_from_kernel_thread+0x5c/0x80
>  [12292.013743] Instruction dump:
>  [12292.013748] 1d290064 3c62ffef 3863ab50 e94a0000 7d2407b4 7c845214
> 4bbbacc9 60000000
>  [12292.013758] 39200001 3d42003e 912adff8 4bfffe68 <0fe00000> 4bfffdd8
> 3c4c00b6 38426540
>  [12292.013769] ---[ end trace 84e4cce770c4d4e3 ]---
>  [12292.013774] Rebooting in 10 seconds..
> 
> Regards,
> Venkat.
> > 
> > cheers
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux