Re: 3.9-rc2 xfs panic

CAI Qian <caiqian@xxxxxxxxxx> · Tue, 12 Mar 2013 02:34:07 -0400 (EDT)



----- Original Message -----
> From: "Dave Chinner" <david@xxxxxxxxxxxxx>
> To: "CAI Qian" <caiqian@xxxxxxxxxx>
> Cc: xfs@xxxxxxxxxxx
> Sent: Tuesday, March 12, 2013 2:07:01 PM
> Subject: Re: 3.9-rc2 xfs panic
> 
> On Tue, Mar 12, 2013 at 12:32:28AM -0400, CAI Qian wrote:
> > Just came across when running xfstests using 3.9-rc2 kernel on a
> > power7
> > box with addition of this patch which fixed a known issue,
> > http://people.redhat.com/qcai/stable/01-fix-double-fetch-hlist.patch
> > 
> > The log shows it was happened around test case 370 with
> > TEST_PARAM_BLKSIZE = 2048
> 
> That doesn't sound like xfstests. it only has 305 tests, and no
> parameters like TEST_PARAM_BLKSIZE....
Sorry, it is a typo, test case 270 not 370. TEST_PARAM_BLKSIZE was
from an internal wrapper to be used to create new filessytem not from the
original xfstests. Apologize for that, Dave.
> 
> > Some more information:
> > xfsprogs version = 3.1.10
> > number of CPUs = 32
> > Swap Size = 4047 MB
> > Mem Size = 4046 M
> > 
> > Still reproducing and bisecting, so this is just a head-up to see
> > if
> > helps.
> > 
> > CAI Qian
> > 
> > [31797.113368] XFS (loop1): xfs_trans_ail_delete_bulk: attempting
> > to delete a log item that is not in the AIL
> > [31797.113383] XFS (loop1): xfs_do_force_shutdown(0x2) called from
> > line 743 of file fs/xfs/xfs_trans_ail.c.  Return address =
> > 0xd000000000f22838
> 
> Shutdown for an in-memory problem of some kind....
> 
> > [31817.508411] XFS (loop0): Mounting Filesystem
> > [31817.566235] XFS (loop0): Ending clean mount
> > [31819.094713] XFS (loop0): Mounting Filesystem
> > [31819.152248] XFS (loop0): Ending clean mount
> > [31819.348238] XFS (loop1): Mounting Filesystem
> > [31819.349879] XFS (loop1): Ending clean mount
> > [31819.561366] XFS (loop0): Mounting Filesystem
> > [31819.616607] XFS (loop0): Ending clean mount
> > [31819.990833] XFS (loop1): Mounting Filesystem
> > [31819.992652] XFS (loop1): Ending clean mount
> > [31819.992768] XFS (loop1): Quotacheck needed: Please wait.
> > [31820.051134] XFS (loop1): Quotacheck: Done.
> > [31832.534868] Unable to handle kernel paging request for data at
> > address 0x5841474900000001
> 
> And after remounting the filesystemi a couple of times, it's tried
> to follow an AGI buffer header (magic # XAGI, seqno = 1) as though
> it was a pointer. I can't think of why that would be
> executed....
> 
> > [31832.534881] Faulting instruction address: 0xc0000000001f8070
> > [31832.534888] Oops: Kernel access of bad area, sig: 11 [#1]
> > [31832.534891] SMP NR_CPUS=1024 NUMA pSeries
> > [31832.534899] Modules linked in: tun(F) binfmt_misc(F) hidp(F)
> > cmtp(F) kernelcapi(F) rfcomm(F) l2tp_ppp(F) l2tp_netlink(F)
> > l2tp_core(F) bnep(F) nfc(F) af_802154(F) pppoe(F) pppox(F)
> > ppp_generic(F) slhc(F) rds(F) af_key(F) atm(F) sctp(F)
> > ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F)
> > btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F)
> > nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) nfsv2(F)
> > nfs(F) dns_resolver(F) lockd(F) sunrpc(F) fscache(F)
> > nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F) fuse(F)
> > sg(F) ibmveth(F) xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F)
> > ibmvscsi(F) scsi_transport_srp(F) scsi_tgt(F) dm_mirror(F)
> > dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: ipt_REJECT]
> > [31832.534978] NIP: c0000000001f8070 LR: c000000000192f6c CTR:
> > c000000000192f50
> > [31832.534984] REGS: c0000000f1c125f0 TRAP: 0300   Tainted: GF
> >       W     (3.9.0-rc2+)
> > [31832.534989] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR:
> > 24022024  XER: 20000001
> > [31832.535003] SOFTE: 0
> > [31832.535006] CFAR: c000000000005f1c
> > [31832.535009] DAR: 5841474900000001, DSISR: 40000000
> > [31832.535013] TASK = c00000003f0111c0[16795] 'loop1' THREAD:
> > c0000000f1c10000 CPU: 30
> > GPR00: c000000000192f6c c0000000f1c12870 c0000000010f3a48
> > c0000000fe015a00
> > GPR04: 0000000000011220 0000000000000080 00000000000f3aaf
> > c0000000018d5840
> > GPR08: 0000000000000000 0000000000000000 0000000000000000
> > c0000000004e3300
> > GPR12: 0000000044024024 c00000000f247800 c0000000010d01b0
> > 0000000000000000
> > GPR16: 0000000000000001 0000000000000000 c0000000009d9020
> > c0000000009d9060
> > GPR20: c0000000009d9048 0000000000000020 000000000000007f
> > 0000000000000000
> > GPR24: 0000000000000fe0 c0000000010d1020 c0000000fe015a00
> > 0000000000000000
> > GPR28: c000000000192f6c 0000000000011220 5841474900000001
> > c0000000fe015a00
> > [31832.535086] NIP [c0000000001f8070] .kmem_cache_alloc+0xb0/0x2d0
> > [31832.535092] LR [c000000000192f6c] .mempool_alloc_slab+0x1c/0x30
> > [31832.535096] Call Trace:
> > [31832.535101] [c0000000f1c12870] [0000000000016ac3] 0x16ac3
> > (unreliable)
> > [31832.535108] [c0000000f1c12920] [c000000000192f6c]
> > .mempool_alloc_slab+0x1c/0x30
> > [31832.535114] [c0000000f1c12990] [c000000000193108]
> > .mempool_alloc+0x88/0x1c0
> > [31832.535122] [c0000000f1c12a80] [c0000000004e1824]
> > .scsi_sg_alloc+0x64/0xc0
> > [31832.535129] [c0000000f1c12af0] [c0000000003e09f8]
> > .__sg_alloc_table+0xa8/0x190
> > [31832.535135] [c0000000f1c12bc0] [c0000000004e15f0]
> > .scsi_alloc_sgtable+0x40/0x90
> > [31832.535142] [c0000000f1c12c40] [c0000000004e1668]
> > .scsi_init_sgtable+0x28/0x90
> > [31832.535148] [c0000000f1c12cc0] [c0000000004e19e0]
> > .scsi_init_io+0x40/0x1a0
> > [31832.535157] [c0000000f1c12d60] [d000000000c02e78]
> > .sd_prep_fn+0x128/0xac0 [sd_mod]
> > [31832.535164] [c0000000f1c12e20] [c0000000003a611c]
> > .blk_peek_request+0xfc/0x2d0
> > [31832.535171] [c0000000f1c12eb0] [c0000000004e2c08]
> > .scsi_request_fn+0xb8/0x6d0
> > [31832.535178] [c0000000f1c12fa0] [c00000000039d7c0]
> > .__blk_run_queue+0x50/0x80
> > [31832.535184] [c0000000f1c13020] [c0000000003a2184]
> > .queue_unplugged+0xe4/0x100
> > [31832.535190] [c0000000f1c130c0] [c0000000003a67d8]
> > .blk_flush_plug_list+0x248/0x2e0
> > [31832.535197] [c0000000f1c13180] [c0000000003a6bcc]
> > .blk_queue_bio+0x2fc/0x490
> > [31832.535203] [c0000000f1c13230] [c0000000003a436c]
> > .generic_make_request+0x11c/0x180
> > [31832.535210] [c0000000f1c132c0] [c0000000003a4484]
> > .submit_bio+0xb4/0x1e0
> > [31832.535245] [c0000000f1c13380] [d000000000eaffa0]
> > .xfs_submit_ioend_bio.isra.10+0x70/0x90 [xfs]
> > [31832.535286] [c0000000f1c133f0] [d000000000eb00f0]
> > .xfs_submit_ioend+0x130/0x190 [xfs]
> > [31832.535343] [c0000000f1c134a0] [d000000000eb045c]
> > .xfs_vm_writepage+0x30c/0x670 [xfs]
> > [31832.535349] [c0000000f1c135d0] [c00000000019d050]
> > .__writepage+0x30/0x90
> > [31832.535356] [c0000000f1c13650] [c00000000019d728]
> > .write_cache_pages+0x208/0x4f0
> > [31832.535362] [c0000000f1c137e0] [c00000000019da5c]
> > .generic_writepages+0x4c/0xa0
> > [31832.535395] [c0000000f1c138a0] [d000000000eaea10]
> > .xfs_vm_writepages+0x60/0x90 [xfs]
> > [31832.535411] [c0000000f1c13930] [c00000000019ee7c]
> > .do_writepages+0x3c/0x70
> > [31832.535424] [c0000000f1c139a0] [c0000000001914b8]
> > .__filemap_fdatawrite_range+0x68/0x80
> > [31832.535430] [c0000000f1c13a40] [c000000000191610]
> > .filemap_write_and_wait_range+0x70/0xc0
> > [31832.535463] [c0000000f1c13ad0] [d000000000eb7970]
> > .xfs_file_fsync+0x60/0x250 [xfs]
> > [31832.535479] [c0000000f1c13b90] [c00000000024c278]
> > .vfs_fsync+0x48/0x70
> > [31832.535497] [c0000000f1c13c00] [c0000000004d299c]
> > .loop_thread+0x3ec/0x5b0
> > [31832.535503] [c0000000f1c13d30] [c0000000000b58c8]
> > .kthread+0xe8/0xf0
> > [31832.535510] [c0000000f1c13e30] [c000000000009f64]
> > .ret_from_kernel_thread+0x64/0x80
> 
> So, looks like memory corruption - a corrupted slab, perhaps? Can
> you turn on memory poisoning, debugging, etc?
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
> 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs