I'm afraid I don't. I don't think I looked when it happened, and searching for one just now came up empty. :/ If it happens again, I'll be sure to keep my eye out for one. FWIW, this particular server (1 out of 5) has 8GB *less* RAM than the others (one bad stick, it seems), and this has happened twice. But it still has 40GB for 12 OSDs, so I think it should be plenty. Thanks for responding. - Travis On Mon, May 13, 2013 at 4:49 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > On Tue, May 7, 2013 at 9:44 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote: >> Hey folks, >> >> Saw this crash the other day: >> >> ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca) >> 1: /usr/bin/ceph-osd() [0x788fba] >> 2: (()+0xfcb0) [0x7f19d1889cb0] >> 3: (gsignal()+0x35) [0x7f19d0248425] >> 4: (abort()+0x17b) [0x7f19d024bb8b] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f19d0b9a69d] >> 6: (()+0xb5846) [0x7f19d0b98846] >> 7: (()+0xb5873) [0x7f19d0b98873] >> 8: (()+0xb596e) [0x7f19d0b9896e] >> 9: (operator new[](unsigned long)+0x47e) [0x7f19d102db1e] >> 10: (ceph::buffer::create(unsigned int)+0x67) [0x834727] >> 11: (ceph::buffer::ptr::ptr(unsigned int)+0x15) [0x834a95] >> 12: (FileStore::read(coll_t, hobject_t const&, unsigned long, >> unsigned long, ceph::buffer::list&)+0x1ae) [0x6fbdde] >> 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, >> bool)+0x347) [0x69ac57] >> 14: (PG::chunky_scrub()+0x375) [0x69faf5] >> 15: (PG::scrub()+0x145) [0x6a0e95] >> 16: (OSD::ScrubWQ::_process(PG*)+0xc) [0x6384ec] >> 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8297e6] >> 18: (ThreadPool::WorkThread::entry()+0x10) [0x82b610] >> 19: (()+0x7e9a) [0x7f19d1881e9a] >> 20: (clone()+0x6d) [0x7f19d0305cbd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> Appears to have gone down during a scrub? >> >> I don't see anything interesting in /var/log/syslog or anywhere else >> at the same time. It's actually the second time I've seen this exact >> stack trace. First time was reported here... (was going to insert >> GMane link, but search.gmane.org appears to be down for me). Well, >> for those inclined, the thread was titled "question about mon memory >> usage", and was also started by me. >> >> Any thoughts? I do plan to upgrade to 0.56.6 when I can. I'm a >> little leery of doing it on a production system without a maintenance >> window, though. When I went from 0.56.3 --> 0.56.4 on a live system, >> a system using the RBD kernel module kpanic'd. =) > > Do you have a core from when this happened? It was indeed during a > scrub, but it didn't fail an assert or anything — looks like maybe it > tried to allocate too much memory or something... :/ > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com