Re: mds "laggy"

Gregory Farnum <greg@xxxxxxxxxxx> · Sun, 28 Apr 2013 21:52:43 -0700



Well, it could also be one of the direct or indirect thread limits in
your VM. How many clients do you have connecting to your MDS? Is your
VM 32- or 64-bit? Have you checked syslog and dmesg for any output?
This is essentially a system administration issue, so check into those
sorts of things and google around for "linux thread limit" or
something. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sun, Apr 28, 2013 at 9:41 PM, Varun Chandramouli <varun.c37@xxxxxxxxx> wrote:
> Hi Greg,
>
> I tried running it on a physical machine, and the task completed without any
> crashes. However, I am still unable to figure out the reason for the mds
> crashing in case of the VMs. I don't see RAM being a bottleneck. Also,
> simply restarting the mds restarts the execution of the code (and the mds
> crashes at fixed intervals too).
>
> Regards
> Varun
>
>
> On Thu, Apr 25, 2013 at 9:55 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Thu, Apr 25, 2013 at 8:22 AM, Noah Watkins <noah.watkins@xxxxxxxxxxx>
>> wrote:
>> >
>> > On Apr 25, 2013, at 4:08 AM, Varun Chandramouli <varun.c37@xxxxxxxxx>
>> > wrote:
>> >
>> >> 2013-04-25 13:54:36.182188 bff8cb40 -1 common/Thread.cc: In function
>> >> 'void Thread::create(size_t)' thread bff8cb40 time 2013-04-25
>> >> 13:54:36.053392#012common/Thread.cc: 110: FAILED assert(ret == 0)#012#012
>> >> ceph version 0.58-500-gaf3b163
>> >> (af3b16349a49a8aee401e27c1b71fd704b31297c)#012 1: (Thread::create(unsigned
>> >> int)+0xdc) [0x843866c]#012 2: (Pipe::start_writer()+0x4e) [0x84d837e]#012 3:
>> >> (Pipe::accept()+0x4955) [0x84ee625]#012 4: (Pipe::reader()+0x1758)
>> >> [0x84f10b8]#012 5: (Pipe::Reader::entry()+0x1e) [0x84f2dee]#012 6:
>> >> (Thread::_entry_func(void*)+0xf) [0x843833f]#012 7: (()+0x6d4c)
>> >> [0xb7784d4c]#012 8: (clone()+0x5e) [0xb7106ace]#012 NOTE: a copy of the
>> >> executable, or `objdump -rdS <executable>` is needed to interpret this.
>> >
>> > The assertion failure here doesn't look like any of the MDS problems I
>> > was getting with Hadoop, but someone else may recognize the problem. A
>> > couple things that might be helpful. First, I think that multi-MDS is less
>> > stable right now than running a single MDS. Second, using GDB to run 'thread
>> > apply all bt' to the crashed MDS core file would provide a lot more context
>> > to help debug.
>>
>> That assert indicates the MDS tried to create a new thread and got an
>> error back. Given that your MDS is already running, this means it's
>> not an issue with thread setup — you've run into a resource limit of
>> some kind. Since you're in VMs I'll guess you've run out of RAM, but
>> it's also possible that the process has exceeded some limitations
>> imposed by the kernel.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>
>
> --
> Varun Chandramouli
> Birla Institute of Technology & Science
> http://in.linkedin.com/in/chandramoulivarun
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com