Re: [powerpc/powervm]Oops: Kernel access of bad area, sig: 11 [#1] while running stress-ng

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018-07-19 19:03, Brian Foster wrote:
cc linux-xfs

On Thu, Jul 19, 2018 at 11:47:59AM +0530, vrbagal1 wrote:
On 2018-07-10 19:12, Michael Ellerman wrote:
> vrbagal1 <vrbagal1@xxxxxxxxxxxxxxxxxx> writes:
>
> > On 2018-07-10 13:37, Nicholas Piggin wrote:
> > > On Tue, 10 Jul 2018 11:58:40 +0530
> > > vrbagal1 <vrbagal1@xxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > Hi,
> > > >
> > > > Observing kernel oops on Power9(ZZ) box, running on PowerVM, while
> > > > running stress-ng.
> > > >
> > > >
> > > > Kernel: 4.18.0-rc4
> > > > Machine: Power9 ZZ (PowerVM)
> > > > Test: Stress-ng
> > > >
> > > > Attached is .config file
> > > >
> > > > Traces:
> > > >
> > > >   [12251.245209] Oops: Kernel access of bad area, sig: 11 [#1]
> > >
> > > Can you post the lines above this? Otherwise we don't know what
> > > address
> > > it tried to access (without decoding the instructions and
> > > reconstructing
> > > it from registers at least, which the XFS devs wouldn't be
> > > inclined to
> > > do).
> > >
> >
> > ah my bad.
> >
> >   [12251.245179] Unable to handle kernel paging request for data at
> > address 0x6000000060000000
> >   [12251.245199] Faulting instruction address: 0xc000000000319e2c
>
> Which matches the regs & disassembly:
>
> r4 = 6000000060000000
> r9 = 0
> ldx     r9,r4,r9	<- pop
>
> So object was 0x6000000060000000.
>
> That looks like two nops, ie. we got some code?
>
> And there's only one caller of prefetch_freepointer() in
> slab_alloc_node():
>
> 	prefetch_freepointer(s, next_object);
>
>
> So slab corruption is looking likely.
>
> Do you have slub_debug=FZP on the kernel command line?

I tried to reproduce this, but didnt succeed. But instead I got xfs
assertion.

Kernel: 4.18.0-rc5
Tree Head: 30b06abfb92bfd5f9b63ea6a2ffb0bd905d1a6da

Traces:

[12290.993612] XFS: Assertion failed: flags & XFS_BMAPI_COWFORK, file:
fs/xfs/libxfs/xfs_bmap.c, line: 4370

This usually means we have a writeback of a page with delayed allocation buffers over an extent map that reflects a hole in the file. This should never happen for normal (non-reflink) data fork writes, hence the assert
failure and -EIO return.

We used to actually (accidentally) allocate a new block in this case,
but more recent kernels generate the error. More interestingly, I think
the pending iomap + writeback rework in XFS may simply drop this error
on the floor because we refer to extent state to process the page rather than the other way around (and buffer_delay() isn't a state in the iomap
bits). This of course depends on which state is actually valid, the
buffer or extent map, which we don't know for sure.

Can you reliably reproduce this problem? If so, can you describe the
reproducer? Also, what is the filesystem geometry ('xfs_info <mnt>')?
And since this is a power kernel, I'm guessing you have 64k pages as
well..?

Brian


I tried, but couldnt hit the issue. But this issue was hit running stress-ng test case, last time(saw some xfs traces) and this time. More specific I was running: /bin/avocado run avocado-misc-tests/generic/stress-ng.py -m /root/avocado-fvt-wrapper/tests/avocado-misc-tests/generic/stress-ng.py.data/stress-ng-cpu.yaml

xfs_info o/p:

# xfs_info /dev/sdi2
meta-data=/dev/sdi2 isize=512 agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


Yes, 64K pagesize.

Regards,
Venkat.



 [12290.993668] ------------[ cut here ]------------
 [12290.993672] kernel BUG at fs/xfs/xfs_message.c:102!
 [12290.993676] Oops: Exception in kernel mode, sig: 5 [#1]
 [12290.993678] LE SMP NR_CPUS=2048 NUMA pSeries
 [12290.993697] Dumping ftrace buffer:
 [12290.993730]    (ftrace buffer empty)
[12290.993735] Modules linked in: camellia_generic(E) cast6_generic(E) cast_common(E) ppp_generic(E) serpent_generic(E) slhc(E) kvm_pr(E) kvm(E)
snd_seq(E) snd_seq_device(E) twofish_generic(E) snd_timer(E) snd(E)
soundcore(E) twofish_common(E) lrw(E) cuse(E) tgr192(E) hci_vhci(E)
vhost_net(E) bluetooth(E) ecdh_generic(E) wp512(E) rfkill(E)
vfio_iommu_spapr_tce(E) vfio_spapr_eeh(E) vfio(E) rmd320(E) uhid(E) tun(E)
tap(E) rmd256(E) rmd160(E) vhost_vsock(E) rmd128(E)
vmw_vsock_virtio_transport_common(E) vsock(E) vhost(E) uinput(E) md4(E) unix_diag(E) dccp_ipv4(E) dccp(E) sha512_generic(E) binfmt_misc(E) fuse(E) vfat(E) fat(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) loop(E) ip6t_rpfilter(E)
ipt_REJECT(E) nf_reject_ipv4(E) ip6t_REJECT(E)
[12290.993784] nf_reject_ipv6(E) xt_conntrack(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ip6table_nat(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) nf_nat_ipv6(E) ip6table_mangle(E) ip6table_security(E) ip6table_raw(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) iptable_mangle(E)
iptable_security(E) iptable_raw(E) ebtable_filter(E) ebtables(E)
ip6table_filter(E) ip6_tables(E) iptable_filter(E) xts(E) sg(E)
vmx_crypto(E) pseries_rng(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E)
ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) lpfc(E) mlx5_core(E)
nvmet_fc(E) nvmet(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E)
scsi_transport_fc(E) mlxfw(E) devlink(E) dm_mirror(E) dm_region_hash(E)
dm_log(E) dm_mod(E) [last unloaded: torture]
 [12290.993834] CPU: 2 PID: 433 Comm: kswapd1 Tainted: G            E
4.18.0-rc5-autotest-autotest #1
 [12290.993838] NIP:  d00000000ebcfb8c LR: d00000000ebcfb64 CTR:
c0000000005260a0
[12290.993841] REGS: c000000feee7ef40 TRAP: 0700 Tainted: G E
(4.18.0-rc5-autotest-autotest)
 [12290.993845] MSR:  800000010282b033
<SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22224044  XER: 0000000d
 [12290.993854] CFAR: d00000000ebcfb74 IRQMASK: 0
[12290.993854] GPR00: d00000000ebcfb64 c000000feee7f1c0 d00000000ec58700
ffffffffffffffea
[12290.993854] GPR04: 000000000000000a c000000feee7f0c0 ffffffffffffffc0
d00000000ec41620
[12290.993854] GPR08: ffffffffffffffca 0000000000000001 0000000000000000
0000000000000000
[12290.993854] GPR12: c0000000005260a0 c00000001ec9d600 c000000feee7f510
0000000040000000
[12290.993854] GPR16: 0000000020000000 0000000000000000 c000000fc6308000
c000000feee7f4c0
[12290.993854] GPR20: c000000feee7f520 c000000feee7f520 000ffffffffe0000
0000000000000000
[12290.993854] GPR24: 00000000001fffff 0000000000000001 c000000f689c8800
0000000000000000
[12290.993854] GPR28: c000000f689c8848 c000000feee7f520 0000000000000001
0000000000000400
 [12290.993907] NIP [d00000000ebcfb8c] assfail+0x5c/0x60 [xfs]
 [12290.993921] LR [d00000000ebcfb64] assfail+0x34/0x60 [xfs]
 [12290.993923] Call Trace:
[12290.993936] [c000000feee7f1c0] [d00000000ebcfb64] assfail+0x34/0x60
[xfs] (unreliable)
 [12290.993947] [c000000feee7f220] [d00000000eb542f0]
xfs_bmapi_write+0x10f0/0x1260 [xfs]
 [12290.993961] [c000000feee7f450] [d00000000ebc34e8]
xfs_iomap_write_allocate+0x1b8/0x430 [xfs]
 [12290.993975] [c000000feee7f5c0] [d00000000eba0de4]
xfs_map_blocks+0x234/0x470 [xfs]
 [12290.993988] [c000000feee7f630] [d00000000eba2d4c]
xfs_do_writepage+0x27c/0x880 [xfs]
 [12290.994001] [c000000feee7f710] [d00000000eba3314]
xfs_do_writepage+0x844/0x880 [xfs]
 [12290.994007] [c000000feee7f780] [c00000000029df24]
pageout.isra.37+0x294/0x4a0
 [12290.994011] [c000000feee7f860] [c0000000002a0a28]
shrink_page_list+0x948/0x1090
 [12290.994015] [c000000feee7f980] [c0000000002a1c8c]
shrink_inactive_list+0x2ec/0x720
 [12290.994019] [c000000feee7fa40] [c0000000002a28d8]
shrink_node_memcg+0x238/0x7d0
 [12290.994024] [c000000feee7fb40] [c0000000002a2f94]
shrink_node+0x124/0x610
 [12290.994027] [c000000feee7fc10] [c0000000002a46d0]
balance_pgdat+0x200/0x460
[12290.994031] [c000000feee7fcf0] [c0000000002a4afc] kswapd+0x1cc/0x5f0 [12290.994036] [c000000feee7fdc0] [c00000000012ca6c] kthread+0x15c/0x1a0
 [12290.994040] [c000000feee7fe30] [c00000000000b65c]
ret_from_kernel_thread+0x5c/0x80
 [12290.994043] Instruction dump:
 [12290.994046] f821ffa1 4bfff8a9 3d220000 e929cb40 89290008 2f890000
40de0018 0fe00000
[12290.994053] 38210060 e8010010 7c0803a6 4e800020 <0fe00000> 3c4c0009
38428b70 7c0802a6
 [12290.994061] ---[ end trace 84e4cce770c4d4e2 ]---
 [12290.996233]
 [12291.996258] Kernel panic - not syncing: Fatal exception
 [12292.006592] Dumping ftrace buffer:
 [12292.006637]    (ftrace buffer empty)
 [12292.013222] WARNING: CPU: 2 PID: 433 at drivers/tty/vt/vt.c:3885
do_unblank_screen+0x278/0x280
[12292.013228] Modules linked in: camellia_generic(E) cast6_generic(E) cast_common(E) ppp_generic(E) serpent_generic(E) slhc(E) kvm_pr(E) kvm(E)
snd_seq(E) snd_seq_device(E) twofish_generic(E) snd_timer(E) snd(E)
soundcore(E) twofish_common(E) lrw(E) cuse(E) tgr192(E) hci_vhci(E)
vhost_net(E) bluetooth(E) ecdh_generic(E) wp512(E) rfkill(E)
vfio_iommu_spapr_tce(E) vfio_spapr_eeh(E) vfio(E) rmd320(E) uhid(E) tun(E)
tap(E) rmd256(E) rmd160(E) vhost_vsock(E) rmd128(E)
vmw_vsock_virtio_transport_common(E) vsock(E) vhost(E) uinput(E) md4(E) unix_diag(E) dccp_ipv4(E) dccp(E) sha512_generic(E) binfmt_misc(E) fuse(E) vfat(E) fat(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) loop(E) ip6t_rpfilter(E)
ipt_REJECT(E) nf_reject_ipv4(E) ip6t_REJECT(E)
[12292.013315] nf_reject_ipv6(E) xt_conntrack(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ip6table_nat(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) nf_nat_ipv6(E) ip6table_mangle(E) ip6table_security(E) ip6table_raw(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) iptable_mangle(E)
iptable_security(E) iptable_raw(E) ebtable_filter(E) ebtables(E)
ip6table_filter(E) ip6_tables(E) iptable_filter(E) xts(E) sg(E)
vmx_crypto(E) pseries_rng(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E)
ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) lpfc(E) mlx5_core(E)
nvmet_fc(E) nvmet(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E)
scsi_transport_fc(E) mlxfw(E) devlink(E) dm_mirror(E) dm_region_hash(E)
dm_log(E) dm_mod(E) [last unloaded: torture]
 [12292.013394] CPU: 2 PID: 433 Comm: kswapd1 Tainted: G      D     E
4.18.0-rc5-autotest-autotest #1
 [12292.013400] NIP:  c0000000005e9cb8 LR: c0000000005e9a7c CTR:
c00000000002bcc0
[12292.013406] REGS: c000000feee7e880 TRAP: 0700 Tainted: G D E
(4.18.0-rc5-autotest-autotest)
[12292.013411] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 28222042
XER: 20040009
 [12292.013422] CFAR: c0000000005e9a90 IRQMASK: 3
[12292.013422] GPR00: c0000000005e9c74 c000000feee7eb00 c000000001150200
0000000000000000
[12292.013422] GPR04: 0000000000000003 000000000000000b 00000000646f6320
3030303020726c20
[12292.013422] GPR08: 0000000000000000 0000000000000000 0000000000000000
3830303031303030
[12292.013422] GPR12: c00000000002bcc0 c00000001ec9d600 c000000feee7f510
0000000040000000
[12292.013422] GPR16: 0000000020000000 0000000000000000 c000000fc6308000
c000000feee7f4c0
[12292.013422] GPR20: c000000feee7f520 c000000feee7f520 000ffffffffe0000
0000000000000000
[12292.013422] GPR24: 00000000001fffff 0000000000000001 c000000000ff9e60
c0000000013067e8
[12292.013422] GPR28: c0000000013067c0 0000000000000000 c000000001414f60
c0000000013067e8
 [12292.013483] NIP [c0000000005e9cb8] do_unblank_screen+0x278/0x280
 [12292.013488] LR [c0000000005e9a7c] do_unblank_screen+0x3c/0x280
 [12292.013492] Call Trace:
 [12292.013496] [c000000feee7eb00] [c0000000005e9c74]
do_unblank_screen+0x234/0x280 (unreliable)
 [12292.013505] [c000000feee7eb80] [c000000000513300]
bust_spinlocks+0x40/0x80
[12292.013511] [c000000feee7eba0] [c0000000001003b0] panic+0x1b8/0x308 [12292.013518] [c000000feee7ec30] [c000000000023a3c] oops_end+0x1ec/0x1f0
 [12292.013524] [c000000feee7ecb0] [c000000000023f54]
_exception_pkey+0x1c4/0x1f0
 [12292.013530] [c000000feee7ee50] [c000000000026020]
program_check_exception+0x2c0/0x3e0
 [12292.013537] [c000000feee7eed0] [c000000000008e94]
program_check_common+0x134/0x140
 [12292.013579] --- interrupt: 700 at assfail+0x5c/0x60 [xfs]
 [12292.013579]     LR = assfail+0x34/0x60 [xfs]
 [12292.013597] [c000000feee7f220] [d00000000eb542f0]
xfs_bmapi_write+0x10f0/0x1260 [xfs]
 [12292.013619] [c000000feee7f450] [d00000000ebc34e8]
xfs_iomap_write_allocate+0x1b8/0x430 [xfs]
 [12292.013641] [c000000feee7f5c0] [d00000000eba0de4]
xfs_map_blocks+0x234/0x470 [xfs]
 [12292.013662] [c000000feee7f630] [d00000000eba2d4c]
xfs_do_writepage+0x27c/0x880 [xfs]
 [12292.013682] [c000000feee7f710] [d00000000eba3314]
xfs_do_writepage+0x844/0x880 [xfs]
 [12292.013689] [c000000feee7f780] [c00000000029df24]
pageout.isra.37+0x294/0x4a0
 [12292.013696] [c000000feee7f860] [c0000000002a0a28]
shrink_page_list+0x948/0x1090
 [12292.013702] [c000000feee7f980] [c0000000002a1c8c]
shrink_inactive_list+0x2ec/0x720
 [12292.013708] [c000000feee7fa40] [c0000000002a28d8]
shrink_node_memcg+0x238/0x7d0
 [12292.013714] [c000000feee7fb40] [c0000000002a2f94]
shrink_node+0x124/0x610
 [12292.013720] [c000000feee7fc10] [c0000000002a46d0]
balance_pgdat+0x200/0x460
[12292.013726] [c000000feee7fcf0] [c0000000002a4afc] kswapd+0x1cc/0x5f0 [12292.013732] [c000000feee7fdc0] [c00000000012ca6c] kthread+0x15c/0x1a0
 [12292.013738] [c000000feee7fe30] [c00000000000b65c]
ret_from_kernel_thread+0x5c/0x80
 [12292.013743] Instruction dump:
 [12292.013748] 1d290064 3c62ffef 3863ab50 e94a0000 7d2407b4 7c845214
4bbbacc9 60000000
[12292.013758] 39200001 3d42003e 912adff8 4bfffe68 <0fe00000> 4bfffdd8
3c4c00b6 38426540
 [12292.013769] ---[ end trace 84e4cce770c4d4e3 ]---
 [12292.013774] Rebooting in 10 seconds..

Regards,
Venkat.
>
> cheers





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux