Well, it could also be one of the direct or indirect thread limits in your VM. How many clients do you have connecting to your MDS? Is your VM 32- or 64-bit? Have you checked syslog and dmesg for any output? This is essentially a system administration issue, so check into those sorts of things and google around for "linux thread limit" or something. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sun, Apr 28, 2013 at 9:41 PM, Varun Chandramouli <varun.c37@xxxxxxxxx> wrote: > Hi Greg, > > I tried running it on a physical machine, and the task completed without any > crashes. However, I am still unable to figure out the reason for the mds > crashing in case of the VMs. I don't see RAM being a bottleneck. Also, > simply restarting the mds restarts the execution of the code (and the mds > crashes at fixed intervals too). > > Regards > Varun > > > On Thu, Apr 25, 2013 at 9:55 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> On Thu, Apr 25, 2013 at 8:22 AM, Noah Watkins <noah.watkins@xxxxxxxxxxx> >> wrote: >> > >> > On Apr 25, 2013, at 4:08 AM, Varun Chandramouli <varun.c37@xxxxxxxxx> >> > wrote: >> > >> >> 2013-04-25 13:54:36.182188 bff8cb40 -1 common/Thread.cc: In function >> >> 'void Thread::create(size_t)' thread bff8cb40 time 2013-04-25 >> >> 13:54:36.053392#012common/Thread.cc: 110: FAILED assert(ret == 0)#012#012 >> >> ceph version 0.58-500-gaf3b163 >> >> (af3b16349a49a8aee401e27c1b71fd704b31297c)#012 1: (Thread::create(unsigned >> >> int)+0xdc) [0x843866c]#012 2: (Pipe::start_writer()+0x4e) [0x84d837e]#012 3: >> >> (Pipe::accept()+0x4955) [0x84ee625]#012 4: (Pipe::reader()+0x1758) >> >> [0x84f10b8]#012 5: (Pipe::Reader::entry()+0x1e) [0x84f2dee]#012 6: >> >> (Thread::_entry_func(void*)+0xf) [0x843833f]#012 7: (()+0x6d4c) >> >> [0xb7784d4c]#012 8: (clone()+0x5e) [0xb7106ace]#012 NOTE: a copy of the >> >> executable, or `objdump -rdS <executable>` is needed to interpret this. >> > >> > The assertion failure here doesn't look like any of the MDS problems I >> > was getting with Hadoop, but someone else may recognize the problem. A >> > couple things that might be helpful. First, I think that multi-MDS is less >> > stable right now than running a single MDS. Second, using GDB to run 'thread >> > apply all bt' to the crashed MDS core file would provide a lot more context >> > to help debug. >> >> That assert indicates the MDS tried to create a new thread and got an >> error back. Given that your MDS is already running, this means it's >> not an issue with thread setup — you've run into a resource limit of >> some kind. Since you're in VMs I'll guess you've run out of RAM, but >> it's also possible that the process has exceeded some limitations >> imposed by the kernel. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com > > > > > -- > Varun Chandramouli > Birla Institute of Technology & Science > http://in.linkedin.com/in/chandramoulivarun _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com