----- Original Message ----- > From: "Dave Chinner" <david@xxxxxxxxxxxxx> > To: "CAI Qian" <caiqian@xxxxxxxxxx> > Cc: xfs@xxxxxxxxxxx > Sent: Tuesday, March 12, 2013 2:07:01 PM > Subject: Re: 3.9-rc2 xfs panic > > On Tue, Mar 12, 2013 at 12:32:28AM -0400, CAI Qian wrote: > > Just came across when running xfstests using 3.9-rc2 kernel on a > > power7 > > box with addition of this patch which fixed a known issue, > > http://people.redhat.com/qcai/stable/01-fix-double-fetch-hlist.patch > > > > The log shows it was happened around test case 370 with > > TEST_PARAM_BLKSIZE = 2048 > > That doesn't sound like xfstests. it only has 305 tests, and no > parameters like TEST_PARAM_BLKSIZE.... Sorry, it is a typo, test case 270 not 370. TEST_PARAM_BLKSIZE was from an internal wrapper to be used to create new filessytem not from the original xfstests. Apologize for that, Dave. > > > Some more information: > > xfsprogs version = 3.1.10 > > number of CPUs = 32 > > Swap Size = 4047 MB > > Mem Size = 4046 M > > > > Still reproducing and bisecting, so this is just a head-up to see > > if > > helps. > > > > CAI Qian > > > > [31797.113368] XFS (loop1): xfs_trans_ail_delete_bulk: attempting > > to delete a log item that is not in the AIL > > [31797.113383] XFS (loop1): xfs_do_force_shutdown(0x2) called from > > line 743 of file fs/xfs/xfs_trans_ail.c. Return address = > > 0xd000000000f22838 > > Shutdown for an in-memory problem of some kind.... > > > [31817.508411] XFS (loop0): Mounting Filesystem > > [31817.566235] XFS (loop0): Ending clean mount > > [31819.094713] XFS (loop0): Mounting Filesystem > > [31819.152248] XFS (loop0): Ending clean mount > > [31819.348238] XFS (loop1): Mounting Filesystem > > [31819.349879] XFS (loop1): Ending clean mount > > [31819.561366] XFS (loop0): Mounting Filesystem > > [31819.616607] XFS (loop0): Ending clean mount > > [31819.990833] XFS (loop1): Mounting Filesystem > > [31819.992652] XFS (loop1): Ending clean mount > > [31819.992768] XFS (loop1): Quotacheck needed: Please wait. > > [31820.051134] XFS (loop1): Quotacheck: Done. > > [31832.534868] Unable to handle kernel paging request for data at > > address 0x5841474900000001 > > And after remounting the filesystemi a couple of times, it's tried > to follow an AGI buffer header (magic # XAGI, seqno = 1) as though > it was a pointer. I can't think of why that would be > executed.... > > > [31832.534881] Faulting instruction address: 0xc0000000001f8070 > > [31832.534888] Oops: Kernel access of bad area, sig: 11 [#1] > > [31832.534891] SMP NR_CPUS=1024 NUMA pSeries > > [31832.534899] Modules linked in: tun(F) binfmt_misc(F) hidp(F) > > cmtp(F) kernelcapi(F) rfcomm(F) l2tp_ppp(F) l2tp_netlink(F) > > l2tp_core(F) bnep(F) nfc(F) af_802154(F) pppoe(F) pppox(F) > > ppp_generic(F) slhc(F) rds(F) af_key(F) atm(F) sctp(F) > > ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) > > btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F) > > nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) nfsv2(F) > > nfs(F) dns_resolver(F) lockd(F) sunrpc(F) fscache(F) > > nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F) fuse(F) > > sg(F) ibmveth(F) xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F) > > ibmvscsi(F) scsi_transport_srp(F) scsi_tgt(F) dm_mirror(F) > > dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: ipt_REJECT] > > [31832.534978] NIP: c0000000001f8070 LR: c000000000192f6c CTR: > > c000000000192f50 > > [31832.534984] REGS: c0000000f1c125f0 TRAP: 0300 Tainted: GF > > W (3.9.0-rc2+) > > [31832.534989] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: > > 24022024 XER: 20000001 > > [31832.535003] SOFTE: 0 > > [31832.535006] CFAR: c000000000005f1c > > [31832.535009] DAR: 5841474900000001, DSISR: 40000000 > > [31832.535013] TASK = c00000003f0111c0[16795] 'loop1' THREAD: > > c0000000f1c10000 CPU: 30 > > GPR00: c000000000192f6c c0000000f1c12870 c0000000010f3a48 > > c0000000fe015a00 > > GPR04: 0000000000011220 0000000000000080 00000000000f3aaf > > c0000000018d5840 > > GPR08: 0000000000000000 0000000000000000 0000000000000000 > > c0000000004e3300 > > GPR12: 0000000044024024 c00000000f247800 c0000000010d01b0 > > 0000000000000000 > > GPR16: 0000000000000001 0000000000000000 c0000000009d9020 > > c0000000009d9060 > > GPR20: c0000000009d9048 0000000000000020 000000000000007f > > 0000000000000000 > > GPR24: 0000000000000fe0 c0000000010d1020 c0000000fe015a00 > > 0000000000000000 > > GPR28: c000000000192f6c 0000000000011220 5841474900000001 > > c0000000fe015a00 > > [31832.535086] NIP [c0000000001f8070] .kmem_cache_alloc+0xb0/0x2d0 > > [31832.535092] LR [c000000000192f6c] .mempool_alloc_slab+0x1c/0x30 > > [31832.535096] Call Trace: > > [31832.535101] [c0000000f1c12870] [0000000000016ac3] 0x16ac3 > > (unreliable) > > [31832.535108] [c0000000f1c12920] [c000000000192f6c] > > .mempool_alloc_slab+0x1c/0x30 > > [31832.535114] [c0000000f1c12990] [c000000000193108] > > .mempool_alloc+0x88/0x1c0 > > [31832.535122] [c0000000f1c12a80] [c0000000004e1824] > > .scsi_sg_alloc+0x64/0xc0 > > [31832.535129] [c0000000f1c12af0] [c0000000003e09f8] > > .__sg_alloc_table+0xa8/0x190 > > [31832.535135] [c0000000f1c12bc0] [c0000000004e15f0] > > .scsi_alloc_sgtable+0x40/0x90 > > [31832.535142] [c0000000f1c12c40] [c0000000004e1668] > > .scsi_init_sgtable+0x28/0x90 > > [31832.535148] [c0000000f1c12cc0] [c0000000004e19e0] > > .scsi_init_io+0x40/0x1a0 > > [31832.535157] [c0000000f1c12d60] [d000000000c02e78] > > .sd_prep_fn+0x128/0xac0 [sd_mod] > > [31832.535164] [c0000000f1c12e20] [c0000000003a611c] > > .blk_peek_request+0xfc/0x2d0 > > [31832.535171] [c0000000f1c12eb0] [c0000000004e2c08] > > .scsi_request_fn+0xb8/0x6d0 > > [31832.535178] [c0000000f1c12fa0] [c00000000039d7c0] > > .__blk_run_queue+0x50/0x80 > > [31832.535184] [c0000000f1c13020] [c0000000003a2184] > > .queue_unplugged+0xe4/0x100 > > [31832.535190] [c0000000f1c130c0] [c0000000003a67d8] > > .blk_flush_plug_list+0x248/0x2e0 > > [31832.535197] [c0000000f1c13180] [c0000000003a6bcc] > > .blk_queue_bio+0x2fc/0x490 > > [31832.535203] [c0000000f1c13230] [c0000000003a436c] > > .generic_make_request+0x11c/0x180 > > [31832.535210] [c0000000f1c132c0] [c0000000003a4484] > > .submit_bio+0xb4/0x1e0 > > [31832.535245] [c0000000f1c13380] [d000000000eaffa0] > > .xfs_submit_ioend_bio.isra.10+0x70/0x90 [xfs] > > [31832.535286] [c0000000f1c133f0] [d000000000eb00f0] > > .xfs_submit_ioend+0x130/0x190 [xfs] > > [31832.535343] [c0000000f1c134a0] [d000000000eb045c] > > .xfs_vm_writepage+0x30c/0x670 [xfs] > > [31832.535349] [c0000000f1c135d0] [c00000000019d050] > > .__writepage+0x30/0x90 > > [31832.535356] [c0000000f1c13650] [c00000000019d728] > > .write_cache_pages+0x208/0x4f0 > > [31832.535362] [c0000000f1c137e0] [c00000000019da5c] > > .generic_writepages+0x4c/0xa0 > > [31832.535395] [c0000000f1c138a0] [d000000000eaea10] > > .xfs_vm_writepages+0x60/0x90 [xfs] > > [31832.535411] [c0000000f1c13930] [c00000000019ee7c] > > .do_writepages+0x3c/0x70 > > [31832.535424] [c0000000f1c139a0] [c0000000001914b8] > > .__filemap_fdatawrite_range+0x68/0x80 > > [31832.535430] [c0000000f1c13a40] [c000000000191610] > > .filemap_write_and_wait_range+0x70/0xc0 > > [31832.535463] [c0000000f1c13ad0] [d000000000eb7970] > > .xfs_file_fsync+0x60/0x250 [xfs] > > [31832.535479] [c0000000f1c13b90] [c00000000024c278] > > .vfs_fsync+0x48/0x70 > > [31832.535497] [c0000000f1c13c00] [c0000000004d299c] > > .loop_thread+0x3ec/0x5b0 > > [31832.535503] [c0000000f1c13d30] [c0000000000b58c8] > > .kthread+0xe8/0xf0 > > [31832.535510] [c0000000f1c13e30] [c000000000009f64] > > .ret_from_kernel_thread+0x64/0x80 > > So, looks like memory corruption - a corrupted slab, perhaps? Can > you turn on memory poisoning, debugging, etc? > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs