On Mar 7, 2013, at 9:24 AM, Greg Farnum <greg@xxxxxxxxxxx> wrote: > This isn't bringing up anything in my brain, but I don't know what that _sample() function is actually doing — did you get any farther into it? _sample reads /proc/self/maps in a loop until eof or some other conditions. i couldn't figure out if the thread was stuck in _sample or a level up. Anyhow, my gdb-foo isn't stellar and I managed to crash the mds. I'm gonna stick some log points in and try to reproduce it. > -Greg > > On Wednesday, March 6, 2013 at 6:23 PM, Noah Watkins wrote: > >> Which, looks to be in a tight loop in the memory model _sample… >> >> (gdb) bt >> #0 0x00007f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0 >> #1 0x00007f027046dd88 in std::__basic_file<char>::xsgetn(char*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #2 0x00007f027046f4c5 in std::basic_filebuf<char, std::char_traits<char> >::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #3 0x00007f0270467ceb in std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #4 0x000000000072bdd4 in MemoryModel::_sample(MemoryModel::snap*) () >> #5 0x00000000005658db in MDCache::check_memory_usage() () >> #6 0x00000000004ba929 in MDS::tick() () >> #7 0x0000000000794c65 in SafeTimer::timer_thread() () >> #8 0x00000000007958ad in SafeTimerThread::entry() () >> #9 0x00007f0270d7de9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 >> >> On Mar 6, 2013, at 6:18 PM, Noah Watkins <jayhawk@xxxxxxxxxxx (mailto:jayhawk@xxxxxxxxxxx)> wrote: >> >>> >>> On Mar 6, 2013, at 5:57 PM, Noah Watkins <jayhawk@xxxxxxxxxxx (mailto:jayhawk@xxxxxxxxxxx)> wrote: >>> >>>> The MDS process in my cluster is running at 100% CPU. In fact I thought the cluster came down, but rather an ls was taking a minute. There aren't any clients active. I've left the process running in case there is any probing you'd like to do on it: >>>> >>>> virt res cpu >>>> 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds >>>> >>>> Thanks, >>>> Noah >>> >>> >>> >>> >>> This is a ceph-mds child thread under strace. The only thread >>> that appears to be doing anything. >>> >>> root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372 >>> Process 3372 attached - interrupt to quit >>> read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050 >>> read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050 >>> read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050 >>> read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020 >>> read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020 >>> read(1649, "7f0217d80000-7f0217e80000 rw-p 0"..., 8191) = 4020 >>> read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020 >>> ... >>> >>> That file looks to be: >>> >>> ceph-mds 3337 root 1649r REG 0,3 0 266903 /proc/3337/maps >>> >>> (3337 is the parent process). >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx) >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html