This isn't bringing up anything in my brain, but I don't know what that _sample() function is actually doing — did you get any farther into it? -Greg On Wednesday, March 6, 2013 at 6:23 PM, Noah Watkins wrote: > Which, looks to be in a tight loop in the memory model _sample… > > (gdb) bt > #0 0x00007f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007f027046dd88 in std::__basic_file<char>::xsgetn(char*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #2 0x00007f027046f4c5 in std::basic_filebuf<char, std::char_traits<char> >::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #3 0x00007f0270467ceb in std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #4 0x000000000072bdd4 in MemoryModel::_sample(MemoryModel::snap*) () > #5 0x00000000005658db in MDCache::check_memory_usage() () > #6 0x00000000004ba929 in MDS::tick() () > #7 0x0000000000794c65 in SafeTimer::timer_thread() () > #8 0x00000000007958ad in SafeTimerThread::entry() () > #9 0x00007f0270d7de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 > > On Mar 6, 2013, at 6:18 PM, Noah Watkins <jayhawk@xxxxxxxxxxx (mailto:jayhawk@xxxxxxxxxxx)> wrote: > > > > > On Mar 6, 2013, at 5:57 PM, Noah Watkins <jayhawk@xxxxxxxxxxx (mailto:jayhawk@xxxxxxxxxxx)> wrote: > > > > > The MDS process in my cluster is running at 100% CPU. In fact I thought the cluster came down, but rather an ls was taking a minute. There aren't any clients active. I've left the process running in case there is any probing you'd like to do on it: > > > > > > virt res cpu > > > 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds > > > > > > Thanks, > > > Noah > > > > > > > > > > This is a ceph-mds child thread under strace. The only thread > > that appears to be doing anything. > > > > root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372 > > Process 3372 attached - interrupt to quit > > read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050 > > read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050 > > read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050 > > read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020 > > read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020 > > read(1649, "7f0217d80000-7f0217e80000 rw-p 0"..., 8191) = 4020 > > read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020 > > ... > > > > That file looks to be: > > > > ceph-mds 3337 root 1649r REG 0,3 0 266903 /proc/3337/maps > > > > (3337 is the parent process). > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx) > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html