Re: CephFS mount delay

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 30 Aug 2012 15:17:58 -0700

On Thu, Aug 30, 2012 at 3:10 PM, Noah Watkins <jayhawk@xxxxxxxxxxx> wrote:
> Ok, this patch sorta solves the problem. After a fresh restart of the
> daemons, the client hangs indefinitely (log urls attached for this
> case). If I kill the client and restart, I get a behavior similar to
> the original problem. Another client restart and everything is very
> fast. This easily reproducible.

I'm not quite sure what scenario you're describing here — let's see if
I understand correctly:
1) Working cluster.
2) Restart server daemons.
3) Client just hangs. :(
4) Restart client and it goes slow through the mount (once?).
5) Restart client again and everything goes fast from then on.

Is that right? If it is, 3 might be caused by a client bug, but then 4
(and also maybe 3) is probably just caused by the MDS server going
through reconnect and timing out its client connection.

> https://dl.dropbox.com/u/7899675/client.log
> https://dl.dropbox.com/u/7899675/mds.a.log
> https://dl.dropbox.com/u/7899675/mds.b.log
> https://dl.dropbox.com/u/7899675/mds.c.log
>
> - Noah
>
> On Thu, Aug 30, 2012 at 1:39 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>> What about this:
>>
>> diff --git a/src/client/Client.cc b/src/client/Client.cc
>> index 3333966..003e3f8 100644
>> --- a/src/client/Client.cc
>> +++ b/src/client/Client.cc
>> @@ -294,6 +294,7 @@ int Client::init()
>>    monclient->set_want_keys(CEPH_ENTITY_TYPE_MDS | CEPH_ENTITY_TYPE_OSD);
>>    monclient->sub_want("mdsmap", 0, 0);
>>    monclient->sub_want("osdmap", 0, CEPH_SUBSCRIBE_ONETIME);
>> +  monclient->renew_subs();
>>
>>    // logger
>>    PerfCountersBuilder plb(cct, "client", l_c_first, l_c_last);
>>
>>
>> If that doesn't do it, can you reproduce with 'debug client = 20' and
>> 'debug monc = 20'?
>>
>> Thanks!
>> sage
>>
>>
>>
>> On Thu, 30 Aug 2012, Noah Watkins wrote:
>>
>>> Here ya go:
>>>
>>> https://dl.dropbox.com/u/7899675/client.log
>>> https://dl.dropbox.com/u/7899675/mds.a.log
>>> https://dl.dropbox.com/u/7899675/mds.b.log
>>> https://dl.dropbox.com/u/7899675/mds.c.log
>>>
>>> - Noah
>>>
>>> On Thu, Aug 30, 2012 at 1:15 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>> > I see that Server::handle_client_session is calling mdlog->flush(), so
>>> > it's a bit odd.  Can you generate a log with 'debug ms = 1' on the client
>>> > (and maybe mds) side?
>>> >
>>> > s
>>> >
>>> > On Thu, 30 Aug 2012, Noah Watkins wrote:
>>> >
>>> >> On Thu, Aug 30, 2012 at 1:06 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>> >> > On Thu, Aug 30, 2012 at 12:55 PM, Noah Watkins <jayhawk@xxxxxxxxxxx> wrote:
>>> >> >> Using a tick interval of 1 drops the cost down to 3 seconds, but still
>>> >> >> a long time for running many unit tests that use fresh mounts.
>>> >> >
>>> >> > Are you using ceph-fuse or the kernel client? And how many of each daemon type?
>>> >>
>>> >> I'm using the C api, and there are 3 mon, 3 mds, 1 osd.
>>> >>
>>> >> > That said; I'm seeing broadly similar numbers ? with one of each
>>> >> > daemon (but otherwise the vstart defaults) "time sudo ceph-fuse mnt"
>>> >> > reports 3.1 seconds.
>>> >>
>>> >>
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html