GPF kernel panics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The mainline packages from Ubuntu should be helpful in testing.

Info: https://wiki.ubuntu.com/Kernel/MainlineBuilds
Packages: http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D

On 31/07/2014 10:31, James Eckersall wrote:
> Ah, thanks for the clarification on that.
> We are very close to the 250 limit, so that is something we'll have to 
> look at addressing, but I don't think it's actually relevant to the 
> panics as since reverting the auth key changes I made appears to have 
> resolved the issue (no panics yet - 20 hours ish and counting).
>
> Now to figure out the best way to get a 3.14 kernel in Ubuntu Trusty :)
>
>
> On 31 July 2014 10:23, Christian Balzer <chibi at gol.com 
> <mailto:chibi at gol.com>> wrote:
>
>     On Thu, 31 Jul 2014 10:13:11 +0100 James Eckersall wrote:
>
>     > Hi,
>     >
>     > I thought the limit was in relation to ceph and that 0.80+ fixed
>     that
>     > limit
>     > - or at least raised it to 4096?
>     >
>     Yes and yes. But 0.80 only made it into kernels 3.14 and beyond. ^o^
>
>     > If there is a 250 limit, can you confirm where this is documented?
>     >
>     In this very ML, see the "v0.75 released" thread:
>     ---
>     On Thu, 16 Jan 2014 15:51:17 +0200 Ilya Dryomov wrote:
>
>     > On Wed, Jan 15, 2014 at 5:42 AM, Sage Weil <sage at inktank.com
>     <mailto:sage at inktank.com>> wrote:
>     > >
>     > > [...]
>     > >
>     > > * rbd: support for 4096 mapped devices, up from ~250 (Ilya
>     Dryomov)
>     >
>     > Just a note, v0.75 simply adds some of the infrastructure, the
>     actual
>     > support for this will arrive with kernel 3.14.  The theoretical
>     limit
>     > is 65536 mapped devices, although I admit I haven't tried
>     mapping more
>     > than ~4000 at once.
>     >
>     ---
>
>
>     Christian
>
>     > Thanks
>     >
>     > J
>     >
>     >
>     > On 31 July 2014 09:50, Christian Balzer <chibi at gol.com
>     <mailto:chibi at gol.com>> wrote:
>     >
>     > >
>     > > Hello,
>     > >
>     > > are you per-chance approaching the maximum amount of kernel
>     mappings,
>     > > which is somewhat shy of 250 in any kernel below 3.14?
>     > >
>     > > If you can easily upgrade to 3.14 see if that fixes it.
>     > >
>     > > Christian
>     > >
>     > > On Thu, 31 Jul 2014 09:37:05 +0100 James Eckersall wrote:
>     > >
>     > > > Hi,
>     > > >
>     > > > The stacktraces are very similar.  Here is another one with
>     complete
>     > > > dmesg: http://pastebin.com/g3X0pZ9E
>     > > >
>     > > > The rbd's are mapped by the rbdmap service on boot.
>     > > > All our ceph servers are running Ubuntu 14.04 (kernel
>     > > > 3.13.0-30-generic). Ceph packages are from the Ubuntu repos,
>     version
>     > > > 0.80.1-0ubuntu1.1. I should have probably mentioned this
>     info in the
>     > > > initial mail :)
>     > > >
>     > > > This problem also seemed to get gradually worse over time.
>     > > > We had a couple of sporadic crashes at the start of the week,
>     > > > escalating to the node being unable to stay up for more than a
>     > > > couple of minutes before panicking.
>     > > >
>     > > > Thanks
>     > > >
>     > > > J
>     > > >
>     > > >
>     > > > On 31 July 2014 09:12, Ilya Dryomov
>     <ilya.dryomov at inktank.com <mailto:ilya.dryomov at inktank.com>> wrote:
>     > > >
>     > > > > On Thu, Jul 31, 2014 at 11:44 AM, James Eckersall
>     > > > > <james.eckersall at gmail.com
>     <mailto:james.eckersall at gmail.com>> wrote:
>     > > > > > Hi,
>     > > > > >
>     > > > > > I've had a fun time with ceph this week.
>     > > > > > We have a cluster with 4 OSD (20 OSD's per) servers, 3
>     mons and a
>     > > > > > server mapping ~200 rbd's and presenting cifs shares.
>     > > > > >
>     > > > > > We're using cephx and the export node has its own cephx
>     auth key.
>     > > > > >
>     > > > > > I made a change to the key last week, adding rwx access to
>     > > > > > another pool.
>     > > > > >
>     > > > > > Since that point, we had sporadic kernel panics of the
>     export
>     > > > > > node.
>     > > > > >
>     > > > > > It got to the point where it would barely finish booting
>     up and
>     > > > > > would
>     > > > > panic.
>     > > > > >
>     > > > > > Once I removed the extra pool I had added to the auth
>     key, it
>     > > > > > hasn't
>     > > > > crashed
>     > > > > > again.
>     > > > > >
>     > > > > > I'm a bit concerned that a change to an auth key can
>     cause this
>     > > > > > type of crash.
>     > > > > > There were no log entries on mon/osd/export node
>     regarding the
>     > > > > > key at
>     > > > > all,
>     > > > > > so it was only by searching my memory for what had
>     changed that
>     > > > > > allowed
>     > > > > me
>     > > > > > to resolve the problem.
>     > > > > >
>     > > > > > From what I could tell from the key, the format was
>     correct and
>     > > > > > the pool that I added did exist, so I am confused as to
>     how this
>     > > > > > would have caused kernel panics.
>     > > > > >
>     > > > > > Below is an example of one of the crash stacktraces.
>     > > > > >
>     > > > > > [   32.713504] general protection fault: 0000 [#1] SMP
>     > > > > > [   32.724718] Modules linked in: ipt_REJECT xt_tcpudp
>     > > > > > iptable_filter ip_tables x_tables rbd libceph libcrc32c
>     gpio_ich
>     > > > > > dcdbas intel_rapl x86_pkg_temp_thermal intel_powerclamp
>     coretemp
>     > > > > > kvm_intel kvm crct10dif_pclmul joydev crc32_pclmul
>     > > > > > ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
>     > > > > > glue_helper ablk_helper cryptd sb_edac edac_core shpchp
>     lpc_ich
>     > > > > > mei_me mei wmi ipmi_si mac_hid acpi_power_meter 8021q
>     garp stp
>     > > > > > mrp llc bonding lp parport nfsd auth_rpcgss nfs_acl nfs
>     lockd
>     > > > > > sunrpc fscache hid_generic igb ixgbe i2c_algo_bit usbhid
>     dca hid
>     > > > > > ptp ahci
>     > > > > libahci
>     > > > > > pps_core megaraid_sas mdio
>     > > > > > [   32.843936] CPU: 18 PID: 5030 Comm: tr Not tainted
>     > > > > > 3.13.0-30-generic #54-Ubuntu
>     > > > > > [   32.860163] Hardware name: Dell Inc. PowerEdge
>     R620/0PXXHP,
>     > > > > > BIOS 1.6.0 03/07/2013
>     > > > > > [   32.876774] task: ffff880417b15fc0 ti: ffff8804273f4000
>     > > > > > task.ti: ffff8804273f4000
>     > > > > > [   32.893384] RIP: 0010:[<ffffffff811a19c5>]
>     > > > > > [<ffffffff811a19c5>] kmem_cache_alloc+0x75/0x1e0
>     > > > > > [   32.912198] RSP: 0018:ffff8804273f5d40  EFLAGS: 00010286
>     > > > > > [   32.924015] RAX: 0000000000000000 RBX:
>     0000000000000000 RCX:
>     > > > > > 00000000000011ed
>     > > > > > [   32.939856] RDX: 00000000000011ec RSI:
>     00000000000080d0 RDI:
>     > > > > > ffff88042f803700
>     > > > > > [   32.955696] RBP: ffff8804273f5d70 R08:
>     0000000000017260 R09:
>     > > > > > ffffffff811be63c
>     > > > > > [   32.971559] R10: 8080808080808080 R11:
>     0000000000000000 R12:
>     > > > > > 7d10f8ec0c3cb928
>     > > > > > [   32.987421] R13: 00000000000080d0 R14:
>     ffff88042f803700 R15:
>     > > > > > ffff88042f803700
>     > > > > > [   33.003284] FS:  0000000000000000(0000)
>     > > > > > GS:ffff88042fd20000(0000) knlGS:0000000000000000
>     > > > > > [   33.021281] CS:  0010 DS: 0000 ES: 0000 CR0:
>     0000000080050033
>     > > > > > [   33.034068] CR2: 00007f01a8fced40 CR3:
>     000000040e52f000 CR4:
>     > > > > > 00000000000407e0
>     > > > > > [   33.049929] Stack:
>     > > > > > [   33.054456]  ffffffff811be63c 0000000000000000
>     > > > > > ffff88041be52780 ffff880428052000
>     > > > > > [   33.071259]  ffff8804273f5f2c 00000000ffffff9c
>     > > > > > ffff8804273f5d98 ffffffff811be63c
>     > > > > > [   33.088084]  0000000000000080 ffff8804273f5f2c
>     > > > > > ffff8804273f5e40 ffff8804273f5e30
>     > > > > > [   33.104908] Call Trace:
>     > > > > > [   33.110399]  [<ffffffff811be63c>] ?
>     get_empty_filp+0x5c/0x180
>     > > > > > [   33.123188]  [<ffffffff811be63c>]
>     get_empty_filp+0x5c/0x180
>     > > > > > [   33.135593]  [<ffffffff811cc03d>] path_openat+0x3d/0x620
>     > > > > > [   33.147422]  [<ffffffff811cd47a>] do_filp_open+0x3a/0x90
>     > > > > > [   33.159250]  [<ffffffff811a1985>] ?
>     > > > > > kmem_cache_alloc+0x35/0x1e0 [ 33.172405]
>     > > > > > [<ffffffff811cc6bf>] ? getname_flags+0x4f/0x190 [  
>     33.185004]
>     > > > > > [<ffffffff811da237>] ? __alloc_fd+0xa7/0x130 [   33.197025]
>     > > > > > [<ffffffff811bbb99>] do_sys_open+0x129/0x280 [   33.209049]
>     > > > > > [<ffffffff81020d25>] ? syscall_trace_enter+0x145/0x250
>     > > > > > [   33.222992]  [<ffffffff811bbd0e>] SyS_open+0x1e/0x20
>     > > > > > [   33.234053]  [<ffffffff8172aeff>] tracesys+0xe1/0xe6
>     > > > > > [   33.245112] Code: dc 00 00 49 8b 50 08 4d 8b 20 49 8b
>     40 10
>     > > > > > 4d 85 e4
>     > > > > 0f
>     > > > > > 84 17 01 00 00 48 85 c0 0f 84 0e 01 00 00 49 63 46 20 48
>     8d 4a
>     > > > > > 01 4d 8b
>     > > > > 06
>     > > > > > <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74
>     b9 49 63
>     > > > > > [   33.292549] RIP  [<ffffffff811a19c5>]
>     > > > > > kmem_cache_alloc+0x75/0x1e0 [ 33.306192]  RSP
>     > > > > > <ffff8804273f5d40>
>     > > > >
>     > > > > Hi James,
>     > > > >
>     > > > > Are all the stacktraces the same?  When are those rbd images
>     > > > > mapped - during
>     > > > > boot with some sort of init script?  Can you attach the entire
>     > > > > dmesg?
>     > > > >
>     > > > > Thanks,
>     > > > >
>     > > > >                 Ilya
>     > > > >
>     > >
>     > >
>     > > --
>     > > Christian Balzer        Network/Systems Engineer
>     > > chibi at gol.com <mailto:chibi at gol.com>   Global OnLine
>     Japan/Fusion Communications
>     > > http://www.gol.com/
>     > >
>
>
>     --
>     Christian Balzer        Network/Systems Engineer
>     chibi at gol.com <mailto:chibi at gol.com>           Global OnLine
>     Japan/Fusion Communications
>     http://www.gol.com/
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140731/23498b31/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux