Re: mds "laggy"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Well, it could also be one of the direct or indirect thread limits in
your VM. How many clients do you have connecting to your MDS? Is your
VM 32- or 64-bit? Have you checked syslog and dmesg for any output?
This is essentially a system administration issue, so check into those
sorts of things and google around for "linux thread limit" or
something. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sun, Apr 28, 2013 at 9:41 PM, Varun Chandramouli <varun.c37@xxxxxxxxx> wrote:
> Hi Greg,
>
> I tried running it on a physical machine, and the task completed without any
> crashes. However, I am still unable to figure out the reason for the mds
> crashing in case of the VMs. I don't see RAM being a bottleneck. Also,
> simply restarting the mds restarts the execution of the code (and the mds
> crashes at fixed intervals too).
>
> Regards
> Varun
>
>
> On Thu, Apr 25, 2013 at 9:55 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Thu, Apr 25, 2013 at 8:22 AM, Noah Watkins <noah.watkins@xxxxxxxxxxx>
>> wrote:
>> >
>> > On Apr 25, 2013, at 4:08 AM, Varun Chandramouli <varun.c37@xxxxxxxxx>
>> > wrote:
>> >
>> >> 2013-04-25 13:54:36.182188 bff8cb40 -1 common/Thread.cc: In function
>> >> 'void Thread::create(size_t)' thread bff8cb40 time 2013-04-25
>> >> 13:54:36.053392#012common/Thread.cc: 110: FAILED assert(ret == 0)#012#012
>> >> ceph version 0.58-500-gaf3b163
>> >> (af3b16349a49a8aee401e27c1b71fd704b31297c)#012 1: (Thread::create(unsigned
>> >> int)+0xdc) [0x843866c]#012 2: (Pipe::start_writer()+0x4e) [0x84d837e]#012 3:
>> >> (Pipe::accept()+0x4955) [0x84ee625]#012 4: (Pipe::reader()+0x1758)
>> >> [0x84f10b8]#012 5: (Pipe::Reader::entry()+0x1e) [0x84f2dee]#012 6:
>> >> (Thread::_entry_func(void*)+0xf) [0x843833f]#012 7: (()+0x6d4c)
>> >> [0xb7784d4c]#012 8: (clone()+0x5e) [0xb7106ace]#012 NOTE: a copy of the
>> >> executable, or `objdump -rdS <executable>` is needed to interpret this.
>> >
>> > The assertion failure here doesn't look like any of the MDS problems I
>> > was getting with Hadoop, but someone else may recognize the problem. A
>> > couple things that might be helpful. First, I think that multi-MDS is less
>> > stable right now than running a single MDS. Second, using GDB to run 'thread
>> > apply all bt' to the crashed MDS core file would provide a lot more context
>> > to help debug.
>>
>> That assert indicates the MDS tried to create a new thread and got an
>> error back. Given that your MDS is already running, this means it's
>> not an issue with thread setup — you've run into a resource limit of
>> some kind. Since you're in VMs I'll guess you've run out of RAM, but
>> it's also possible that the process has exceeded some limitations
>> imposed by the kernel.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>
>
> --
> Varun Chandramouli
> Birla Institute of Technology & Science
> http://in.linkedin.com/in/chandramoulivarun
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux