On Tue, May 7, 2013 at 9:44 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote: > Hey folks, > > Saw this crash the other day: > > ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca) > 1: /usr/bin/ceph-osd() [0x788fba] > 2: (()+0xfcb0) [0x7f19d1889cb0] > 3: (gsignal()+0x35) [0x7f19d0248425] > 4: (abort()+0x17b) [0x7f19d024bb8b] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f19d0b9a69d] > 6: (()+0xb5846) [0x7f19d0b98846] > 7: (()+0xb5873) [0x7f19d0b98873] > 8: (()+0xb596e) [0x7f19d0b9896e] > 9: (operator new[](unsigned long)+0x47e) [0x7f19d102db1e] > 10: (ceph::buffer::create(unsigned int)+0x67) [0x834727] > 11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x834a95] > 12: (FileStore::read(coll_t, hobject_t const&, unsigned long, > unsigned long, ceph::buffer::list&)+0x1ae) [0x6fbdde] > 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, > bool)+0x347) [0x69ac57] > 14: (PG::chunky_scrub()+0x375) [0x69faf5] > 15: (PG::scrub()+0x145) [0x6a0e95] > 16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x6384ec] > 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8297e6] > 18: (ThreadPool::WorkThread::entry()+0x10) [0x82b610] > 19: (()+0x7e9a) [0x7f19d1881e9a] > 20: (clone()+0x6d) [0x7f19d0305cbd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > Appears to have gone down during a scrub? > > I don't see anything interesting in /var/log/syslog or anywhere else > at the same time. It's actually the second time I've seen this exact > stack trace. First time was reported here... (was going to insert > GMane link, but search.gmane.org appears to be down for me). Well, > for those inclined, the thread was titled "question about mon memory > usage", and was also started by me. > > Any thoughts? I do plan to upgrade to 0.56.6 when I can. I'm a > little leery of doing it on a production system without a maintenance > window, though. When I went from 0.56.3 --> 0.56.4 on a live system, > a system using the RBD kernel module kpanic'd. =) Do you have a core from when this happened? It was indeed during a scrub, but it didn't fail an assert or anything — looks like maybe it tried to allocate too much memory or something... :/ -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com