Hi, > On Aug 30, 2017, at 5:58 PM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > > On Wed, Aug 30, 2017 at 03:56:05PM +0200, Christian Theune wrote: >> Hi, >> >> just got it again on a different call path, maybe that helps: >> >> [ 1070.136303] Oops: 0000 [#1] SMP >> [ 1070.142577] Modules linked in: nf_log_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_log_ipv6 nf_log_common xt_LOG xt_limit nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack sch_fq x86_pkg_temp_thermal kvm_intel kvm irqbypass nvme crc32c_intel ixgbe nvme_core mdio acpi_cpufreq nbd nf_conntrack_ftp nf_conntrack dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_round_robin dm_multipath xts aesni_intel glue_helper lrw ablk_helper cryptd aes_x86_64 fuse dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log >> [ 1070.233784] CPU: 19 PID: 7460 Comm: ceph-osd Not tainted 4.9.43 #1 >> [ 1070.246124] Hardware name: Thomas-Krenn.AG X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013 >> [ 1070.260895] task: ffff8810517d0000 task.stack: ffffc9002abec000 >> [ 1070.272710] RIP: 0010:[<ffffffff81312320>] [<ffffffff81312320>] xfs_da3_node_read+0x30/0xb0 >> [ 1070.289592] RSP: 0018:ffffc9002abefd28 EFLAGS: 00010286 >> [ 1070.300199] RAX: 0000000000000000 RBX: ffff88104d859a48 RCX: 0000000000000001 >> [ 1070.314447] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc9002abefce0 >> [ 1070.328694] RBP: ffffc9002abefd48 R08: 0000000066656566 R09: ffffc9002abefbc0 >> [ 1070.342942] R10: fffffffffffffffe R11: 0000000000000001 R12: ffffc9002abefd78 >> [ 1070.357191] R13: ffff88066b430780 R14: 0000000000000005 R15: 0000000066656566 >> [ 1070.371436] FS: 00007fe511bfc700(0000) GS:ffff88107fbc0000(0000) knlGS:0000000000000000 >> [ 1070.387590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1070.399066] CR2: 00000000000000a0 CR3: 0000000f14d50000 CR4: 00000000001406e0 >> [ 1070.413311] Stack: >> [ 1070.417332] ffffffff81a44fe0 ffffc9002abefd48 ffffc9002abefdd0 0000000000000005 >> [ 1070.432239] ffffc9002abefdb8 ffffffff81337404 0000000200000008 ffff8809b5cab040 >> [ 1070.447144] 000000005e94ce38 ffff880c25e1c600 0000000000000000 0000000000000000 >> [ 1070.462051] Call Trace: >> [ 1070.466949] [<ffffffff81337404>] xfs_attr3_node_inactive+0x174/0x210 >> [ 1070.479802] [<ffffffff813376da>] xfs_attr_inactive+0x23a/0x250 there’s a subtle difference: xfs_inactive is calling xfs_attr_inactive. That wasn’t in there the last time. As I don’t know the internals it might also be irrelevant and you filtered that out correctly whereas it appeared maybe important to me. :) >> [ 1070.491625] [<ffffffff81350a4b>] xfs_inactive+0x7b/0x110 >> [ 1070.502403] [<ffffffff81359344>] xfs_fs_destroy_inode+0xa4/0x210 >> [ 1070.514573] [<ffffffff811c46cb>] destroy_inode+0x3b/0x60 >> [ 1070.525352] [<ffffffff811c4819>] evict+0x129/0x190 >> [ 1070.535093] [<ffffffff811c4c4a>] iput+0x19a/0x200 >> [ 1070.544660] [<ffffffff811b9129>] do_unlinkat+0x129/0x2d0 >> [ 1070.555445] [<ffffffff811b9d26>] SyS_unlink+0x16/0x20 >> [ 1070.565706] [<ffffffff81885260>] entry_SYSCALL_64_fastpath+0x13/0x94 > > This looks like the same call stack as last time. > > Is this with a patched 4.9.43 kernel, or just vanilla? Just vanilla. Didn’t have time to do any patching: also this is in production and individually the hosts take up to 14 days before crashing right now. The system has been up for 6 hours now and aside from the one defect FS the other mounted filesystems are performing OK. Christian PS: Shouldn’t you be offline? :) -- Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 Flying Circus Internet Operations GmbH · http://flyingcircus.io Forsterstraße 29 · 06112 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
Attachment:
signature.asc
Description: Message signed with OpenPGP