On Mon, Feb 11, 2013 at 12:25:59PM -0800, Gregory Farnum wrote: > On Mon, Feb 4, 2013 at 10:01 AM, Kevin Decherf <kevin@xxxxxxxxxxxx> wrote: > > References: > > [1] http://www.spinics.net/lists/ceph-devel/msg04903.html > > [2] ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7) > > 1: /usr/bin/ceph-mds() [0x817e82] > > 2: (()+0xf140) [0x7f9091d30140] > > 3: (MDCache::request_drop_foreign_locks(MDRequest*)+0x21) [0x5b9dc1] > > 4: (MDCache::request_drop_locks(MDRequest*)+0x19) [0x5baae9] > > 5: (MDCache::request_cleanup(MDRequest*)+0x60) [0x5bab70] > > 6: (MDCache::request_kill(MDRequest*)+0x80) [0x5bae90] > > 7: (Server::journal_close_session(Session*, int)+0x372) [0x549aa2] > > 8: (Server::kill_session(Session*)+0x137) [0x549c67] > > 9: (Server::find_idle_sessions()+0x12a6) [0x54b0d6] > > 10: (MDS::tick()+0x338) [0x4da928] > > 11: (SafeTimer::timer_thread()+0x1af) [0x78151f] > > 12: (SafeTimerThread::entry()+0xd) [0x782bad] > > 13: (()+0x7ddf) [0x7f9091d28ddf] > > 14: (clone()+0x6d) [0x7f90909cc24d] > > This in particular is quite odd. Do you have any logging from when > that happened? (Oftentimes the log can have a bunch of debugging > information from shortly before the crash.) Yes, there is a dump of 100,000 events for this backtrace in the linked archive (I need 7 hours to upload it). > > On Mon, Feb 11, 2013 at 10:54 AM, Kevin Decherf <kevin@xxxxxxxxxxxx> wrote: > > Furthermore, I observe another strange thing more or less related to the > > storms. > > > > During a rsync command to write ~20G of data on Ceph and during (and > > after) the storm, one OSD sends a lot of data to the active MDS > > (400Mbps peak each 6 seconds). After a quick check, I found that when I > > stop osd.23, osd.14 stops its peaks. > > This is consistent with Sam's suggestion that MDS is thrashing its > cache, and is grabbing a directory object off of the OSDs. How large > are the directories you're using? If they're a significant fraction of > your cache size, it might be worth enabling the (sadly less stable) > directory fragmentation options, which will split them up into smaller > fragments that can be independently read and written to disk. The distribution is heterogeneous: we have a folder of ~17G for 300k objects, another of ~2G for 150k objects and a lof of smaller directories. Are you talking about the mds bal frag and mds bal split * settings? Do you have any advice about the value to use? -- Kevin Decherf - @Kdecherf GPG C610 FE73 E706 F968 612B E4B2 108A BD75 A81E 6E2F http://kdecherf.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html