Re: Crash and strange things on MDS

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 13 Feb 2013 10:19:36 -0800



On Wed, Feb 13, 2013 at 3:47 AM, Kevin Decherf <kevin@xxxxxxxxxxxx> wrote:
> On Mon, Feb 11, 2013 at 12:25:59PM -0800, Gregory Farnum wrote:
>> On Mon, Feb 11, 2013 at 10:54 AM, Kevin Decherf <kevin@xxxxxxxxxxxx> wrote:
>> > Furthermore, I observe another strange thing more or less related to the
>> > storms.
>> >
>> > During a rsync command to write ~20G of data on Ceph and during (and
>> > after) the storm, one OSD sends a lot of data to the active MDS
>> > (400Mbps peak each 6 seconds). After a quick check, I found that when I
>> > stop osd.23, osd.14 stops its peaks.
>>
>> This is consistent with Sam's suggestion that MDS is thrashing its
>> cache, and is grabbing a directory object off of the OSDs. How large
>> are the directories you're using? If they're a significant fraction of
>> your cache size, it might be worth enabling the (sadly less stable)
>> directory fragmentation options, which will split them up into smaller
>> fragments that can be independently read and written to disk.
>
> I set mds cache size to 400000 but now I observe ~900Mbps peaks from
> osd.14 to the active mds, osd.18 and osd.2.
>
> osd.14 shares some pg with osd.18 and osd.2:
> http://pastebin.com/raw.php?i=uBAcTcu4

The high bandwidth from OSD to MDS really isn't a concern — that's the
MDS asking for data and getting it back quickly! We're concerned about
client responsiveness; has that gotten better?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html