GPF kernel panics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

The stacktraces are very similar.  Here is another one with complete dmesg:
http://pastebin.com/g3X0pZ9E

The rbd's are mapped by the rbdmap service on boot.
All our ceph servers are running Ubuntu 14.04 (kernel 3.13.0-30-generic).
Ceph packages are from the Ubuntu repos, version 0.80.1-0ubuntu1.1.
I should have probably mentioned this info in the initial mail :)

This problem also seemed to get gradually worse over time.
We had a couple of sporadic crashes at the start of the week, escalating to
the node being unable to stay up for more than a couple of minutes before
panicking.

Thanks

J


On 31 July 2014 09:12, Ilya Dryomov <ilya.dryomov at inktank.com> wrote:

> On Thu, Jul 31, 2014 at 11:44 AM, James Eckersall
> <james.eckersall at gmail.com> wrote:
> > Hi,
> >
> > I've had a fun time with ceph this week.
> > We have a cluster with 4 OSD (20 OSD's per) servers, 3 mons and a server
> > mapping ~200 rbd's and presenting cifs shares.
> >
> > We're using cephx and the export node has its own cephx auth key.
> >
> > I made a change to the key last week, adding rwx access to another pool.
> >
> > Since that point, we had sporadic kernel panics of the export node.
> >
> > It got to the point where it would barely finish booting up and would
> panic.
> >
> > Once I removed the extra pool I had added to the auth key, it hasn't
> crashed
> > again.
> >
> > I'm a bit concerned that a change to an auth key can cause this type of
> > crash.
> > There were no log entries on mon/osd/export node regarding the key at
> all,
> > so it was only by searching my memory for what had changed that allowed
> me
> > to resolve the problem.
> >
> > From what I could tell from the key, the format was correct and the pool
> > that I added did exist, so I am confused as to how this would have caused
> > kernel panics.
> >
> > Below is an example of one of the crash stacktraces.
> >
> > [   32.713504] general protection fault: 0000 [#1] SMP
> > [   32.724718] Modules linked in: ipt_REJECT xt_tcpudp iptable_filter
> > ip_tables x_tables rbd libceph libcrc32c gpio_ich dcdbas intel_rapl
> > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> > crct10dif_pclmul joydev crc32_pclmul ghash_clmulni_intel aesni_intel
> > aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core
> > shpchp lpc_ich mei_me mei wmi ipmi_si mac_hid acpi_power_meter 8021q garp
> > stp mrp llc bonding lp parport nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
> > fscache hid_generic igb ixgbe i2c_algo_bit usbhid dca hid ptp ahci
> libahci
> > pps_core megaraid_sas mdio
> > [   32.843936] CPU: 18 PID: 5030 Comm: tr Not tainted 3.13.0-30-generic
> > #54-Ubuntu
> > [   32.860163] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 1.6.0
> > 03/07/2013
> > [   32.876774] task: ffff880417b15fc0 ti: ffff8804273f4000 task.ti:
> > ffff8804273f4000
> > [   32.893384] RIP: 0010:[<ffffffff811a19c5>]  [<ffffffff811a19c5>]
> > kmem_cache_alloc+0x75/0x1e0
> > [   32.912198] RSP: 0018:ffff8804273f5d40  EFLAGS: 00010286
> > [   32.924015] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > 00000000000011ed
> > [   32.939856] RDX: 00000000000011ec RSI: 00000000000080d0 RDI:
> > ffff88042f803700
> > [   32.955696] RBP: ffff8804273f5d70 R08: 0000000000017260 R09:
> > ffffffff811be63c
> > [   32.971559] R10: 8080808080808080 R11: 0000000000000000 R12:
> > 7d10f8ec0c3cb928
> > [   32.987421] R13: 00000000000080d0 R14: ffff88042f803700 R15:
> > ffff88042f803700
> > [   33.003284] FS:  0000000000000000(0000) GS:ffff88042fd20000(0000)
> > knlGS:0000000000000000
> > [   33.021281] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   33.034068] CR2: 00007f01a8fced40 CR3: 000000040e52f000 CR4:
> > 00000000000407e0
> > [   33.049929] Stack:
> > [   33.054456]  ffffffff811be63c 0000000000000000 ffff88041be52780
> > ffff880428052000
> > [   33.071259]  ffff8804273f5f2c 00000000ffffff9c ffff8804273f5d98
> > ffffffff811be63c
> > [   33.088084]  0000000000000080 ffff8804273f5f2c ffff8804273f5e40
> > ffff8804273f5e30
> > [   33.104908] Call Trace:
> > [   33.110399]  [<ffffffff811be63c>] ? get_empty_filp+0x5c/0x180
> > [   33.123188]  [<ffffffff811be63c>] get_empty_filp+0x5c/0x180
> > [   33.135593]  [<ffffffff811cc03d>] path_openat+0x3d/0x620
> > [   33.147422]  [<ffffffff811cd47a>] do_filp_open+0x3a/0x90
> > [   33.159250]  [<ffffffff811a1985>] ? kmem_cache_alloc+0x35/0x1e0
> > [   33.172405]  [<ffffffff811cc6bf>] ? getname_flags+0x4f/0x190
> > [   33.185004]  [<ffffffff811da237>] ? __alloc_fd+0xa7/0x130
> > [   33.197025]  [<ffffffff811bbb99>] do_sys_open+0x129/0x280
> > [   33.209049]  [<ffffffff81020d25>] ? syscall_trace_enter+0x145/0x250
> > [   33.222992]  [<ffffffff811bbd0e>] SyS_open+0x1e/0x20
> > [   33.234053]  [<ffffffff8172aeff>] tracesys+0xe1/0xe6
> > [   33.245112] Code: dc 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4
> 0f
> > 84 17 01 00 00 48 85 c0 0f 84 0e 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b
> 06
> > <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
> > [   33.292549] RIP  [<ffffffff811a19c5>] kmem_cache_alloc+0x75/0x1e0
> > [   33.306192]  RSP <ffff8804273f5d40>
>
> Hi James,
>
> Are all the stacktraces the same?  When are those rbd images mapped -
> during
> boot with some sort of init script?  Can you attach the entire dmesg?
>
> Thanks,
>
>                 Ilya
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140731/715659ea/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux