On Mon, Feb 4, 2013 at 10:01 AM, Kevin Decherf <kevin@xxxxxxxxxxxx> wrote: > References: > [1] http://www.spinics.net/lists/ceph-devel/msg04903.html > [2] ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7) > 1: /usr/bin/ceph-mds() [0x817e82] > 2: (()+0xf140) [0x7f9091d30140] > 3: (MDCache::request_drop_foreign_locks(MDRequest*)+0x21) [0x5b9dc1] > 4: (MDCache::request_drop_locks(MDRequest*)+0x19) [0x5baae9] > 5: (MDCache::request_cleanup(MDRequest*)+0x60) [0x5bab70] > 6: (MDCache::request_kill(MDRequest*)+0x80) [0x5bae90] > 7: (Server::journal_close_session(Session*, int)+0x372) [0x549aa2] > 8: (Server::kill_session(Session*)+0x137) [0x549c67] > 9: (Server::find_idle_sessions()+0x12a6) [0x54b0d6] > 10: (MDS::tick()+0x338) [0x4da928] > 11: (SafeTimer::timer_thread()+0x1af) [0x78151f] > 12: (SafeTimerThread::entry()+0xd) [0x782bad] > 13: (()+0x7ddf) [0x7f9091d28ddf] > 14: (clone()+0x6d) [0x7f90909cc24d] This in particular is quite odd. Do you have any logging from when that happened? (Oftentimes the log can have a bunch of debugging information from shortly before the crash.) On Mon, Feb 11, 2013 at 10:54 AM, Kevin Decherf <kevin@xxxxxxxxxxxx> wrote: > Furthermore, I observe another strange thing more or less related to the > storms. > > During a rsync command to write ~20G of data on Ceph and during (and > after) the storm, one OSD sends a lot of data to the active MDS > (400Mbps peak each 6 seconds). After a quick check, I found that when I > stop osd.23, osd.14 stops its peaks. This is consistent with Sam's suggestion that MDS is thrashing its cache, and is grabbing a directory object off of the OSDs. How large are the directories you're using? If they're a significant fraction of your cache size, it might be worth enabling the (sadly less stable) directory fragmentation options, which will split them up into smaller fragments that can be independently read and written to disk. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html