Re: null pointer reference after crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

(sorry for the staggered posting, sitting in an off-hours maintenance cycle and got time to think and things are trickling in).

The hopefully last piece that I can add for today is that I’ve ever only seen this happen is within maybe 10 minutes or less, after a reboot and (clean or unclean doesn’t matter) mount and then getting an immediate spike of traffic with Ceph recoverying things. Also, we started adding a ‘find’ and ‘vmtouch’ prewarm script before starting Ceph. So the boot order is

- mount
- vmtouch selected files
- cache inodes by running ‘find’ on the disk
- start Ceph osd

Once it survived for a while I haven’t seen it crash at all, only directly after boots. (So far)

Christian

> On Sep 1, 2017, at 11:03 PM, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> something that might be of value: I haven’t seen those on 4.9.25 (where we started to see very regular host crashes/reboots due to iommu issues) at all - they only started to creep up on the 4.9.43 that we’ve been running for about 2 weeks now.
> 
> Christian
> 
>> On Sep 1, 2017, at 10:53 PM, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:
>> 
>> Hi,
>> 
>> got it again today: this time with a filesystem that just seconds before saw a (clean) xfs_repair. Also, another Ceph user stumbled over this today:
>> https://www.spinics.net/lists/ceph-users/msg36628.html
>> 
>> Here’s my dump of today - it’s identical to the last one, so maybe this will be the last one I’m posting here until you ask me for more information. :)
>> 
>> [ 2052.528430] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
>> [ 2052.544143] IP: [<ffffffff81312320>] xfs_da3_node_read+0x30/0xb0
>> [ 2052.556170] PGD 0 [ 2052.559844]
>> [ 2052.562825] Oops: 0000 [#1] SMP
>> [ 2052.569099] Modules linked in: nf_log_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_log_ipv6 nf_log_common xt_LOG xt_limit nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack sch_fq x86_pkg_temp_thermal kvm_intel kvm irqbypass ixgbe nvme crc32c_intel nvme_core mdio acpi_cpufreq nbd nf_conntrack_ftp nf_conntrack dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_round_robin dm_multipath xts aesni_intel glue_helper lrw ablk_helper cryptd aes_x86_64 fuse dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log
>> [ 2052.660301] CPU: 20 PID: 12288 Comm: ceph-osd Not tainted 4.9.43 #1
>> [ 2052.672811] Hardware name: Thomas-Krenn.AG X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013
>> [ 2052.687579] task: ffff880f0d85b900 task.stack: ffffc90009708000
>> [ 2052.699397] RIP: 0010:[<ffffffff81312320>]  [<ffffffff81312320>] xfs_da3_node_read+0x30/0xb0
>> [ 2052.716280] RSP: 0018:ffffc9000970bd28  EFLAGS: 00010286
>> [ 2052.726886] RAX: 0000000000000000 RBX: ffff8810504e7878 RCX: 0000000000000001
>> [ 2052.741135] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc9000970bce0
>> [ 2052.755379] RBP: ffffc9000970bd48 R08: 00000000b6c20f50 R09: ffffc9000970bbc0
>> [ 2052.769627] R10: fffffffffffffffe R11: 0000000000000001 R12: ffffc9000970bd78
>> [ 2052.783875] R13: ffff880f8e65cec0 R14: 0000000000000003 R15: 00000000b6c20f50
>> [ 2052.798120] FS:  00007fdd627fa700(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000
>> [ 2052.814276] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2052.825745] CR2: 00000000000000a0 CR3: 0000000fcf9e8000 CR4: 00000000001406e0
>> [ 2052.839992] Stack:
>> [ 2052.844015]  ffffffff81a44fe0 ffffc9000970bd48 ffffc9000970bdd0 0000000000000003
>> [ 2052.858918]  ffffc9000970bdb8 ffffffff81337404 0000000200000008 ffff880892da4040
>> [ 2052.873824]  000000005e94d370 ffff88105a603000 0000000000000000 0000000000000000
>> [ 2052.888730] Call Trace:
>> [ 2052.893630]  [<ffffffff81337404>] xfs_attr3_node_inactive+0x174/0x210
>> [ 2052.906496]  [<ffffffff813376da>] xfs_attr_inactive+0x23a/0x250
>> [ 2052.918317]  [<ffffffff81350a4b>] xfs_inactive+0x7b/0x110
>> [ 2052.929096]  [<ffffffff81359344>] xfs_fs_destroy_inode+0xa4/0x210
>> [ 2052.941267]  [<ffffffff811c46cb>] destroy_inode+0x3b/0x60
>> [ 2052.952041]  [<ffffffff811c4819>] evict+0x129/0x190
>> [ 2052.961783]  [<ffffffff811c4c4a>] iput+0x19a/0x200
>> [ 2052.971349]  [<ffffffff811b9129>] do_unlinkat+0x129/0x2d0
>> [ 2052.982134]  [<ffffffff811b9d26>] SyS_unlink+0x16/0x20
>> [ 2052.992394]  [<ffffffff81885260>] entry_SYSCALL_64_fastpath+0x13/0x94
>> [ 2053.005252] Code: 55 48 89 e5 41 54 53 4d 89 c4 48 89 fb 48 83 ec 10 48 c7 04 24 e0 4f a4 81 e8 fd fe ff ff 85 c0 75 46 48 85 db 74 41 49 8b 34 24 <48> 8b 96 a0 00 00 00 0f b7 52 08 66 c1 c2 08 66 81 fa be 3e 74
>> [ 2053.045148] RIP  [<ffffffff81312320>] xfs_da3_node_read+0x30/0xb0
>> [ 2053.057348]  RSP <ffffc9000970bd28>
>> [ 2053.064318] CR2: 00000000000000a0
>> [ 2053.071494] ---[ end trace 9360ec3fb784a9ab ]---
>> 
>> Cheers,
>> Christian
>> 
>> --
>> Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
>> Flying Circus Internet Operations GmbH · http://flyingcircus.io
>> Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
>> 
> 
> Liebe Grüße,
> Christian Theune
> 
> --
> Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
> Flying Circus Internet Operations GmbH · http://flyingcircus.io
> Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
> 

Liebe Grüße,
Christian Theune

--
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick

Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux