Re: MDS running at 100% CPU, no clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This isn't bringing up anything in my brain, but I don't know what that _sample() function is actually doing — did you get any farther into it?
-Greg

On Wednesday, March 6, 2013 at 6:23 PM, Noah Watkins wrote:

> Which, looks to be in a tight loop in the memory model _sample…
>  
> (gdb) bt
> #0 0x00007f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00007f027046dd88 in std::__basic_file<char>::xsgetn(char*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #2 0x00007f027046f4c5 in std::basic_filebuf<char, std::char_traits<char> >::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3 0x00007f0270467ceb in std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4 0x000000000072bdd4 in MemoryModel::_sample(MemoryModel::snap*) ()
> #5 0x00000000005658db in MDCache::check_memory_usage() ()
> #6 0x00000000004ba929 in MDS::tick() ()
> #7 0x0000000000794c65 in SafeTimer::timer_thread() ()
> #8 0x00000000007958ad in SafeTimerThread::entry() ()
> #9 0x00007f0270d7de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
>  
> On Mar 6, 2013, at 6:18 PM, Noah Watkins <jayhawk@xxxxxxxxxxx (mailto:jayhawk@xxxxxxxxxxx)> wrote:
>  
> >  
> > On Mar 6, 2013, at 5:57 PM, Noah Watkins <jayhawk@xxxxxxxxxxx (mailto:jayhawk@xxxxxxxxxxx)> wrote:
> >  
> > > The MDS process in my cluster is running at 100% CPU. In fact I thought the cluster came down, but rather an ls was taking a minute. There aren't any clients active. I've left the process running in case there is any probing you'd like to do on it:
> > >  
> > > virt res cpu
> > > 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds
> > >  
> > > Thanks,
> > > Noah
> >  
> >  
> >  
> >  
> > This is a ceph-mds child thread under strace. The only thread
> > that appears to be doing anything.
> >  
> > root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372
> > Process 3372 attached - interrupt to quit
> > read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050
> > read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050
> > read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050
> > read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020
> > read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020
> > read(1649, "7f0217d80000-7f0217e80000 rw-p 0"..., 8191) = 4020
> > read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020
> > ...
> >  
> > That file looks to be:
> >  
> > ceph-mds 3337 root 1649r REG 0,3 0 266903 /proc/3337/maps
> >  
> > (3337 is the parent process).
>  
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx)
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux