On Thu, Aug 30, 2012 at 3:10 PM, Noah Watkins <jayhawk@xxxxxxxxxxx> wrote: > Ok, this patch sorta solves the problem. After a fresh restart of the > daemons, the client hangs indefinitely (log urls attached for this > case). If I kill the client and restart, I get a behavior similar to > the original problem. Another client restart and everything is very > fast. This easily reproducible. I'm not quite sure what scenario you're describing here — let's see if I understand correctly: 1) Working cluster. 2) Restart server daemons. 3) Client just hangs. :( 4) Restart client and it goes slow through the mount (once?). 5) Restart client again and everything goes fast from then on. Is that right? If it is, 3 might be caused by a client bug, but then 4 (and also maybe 3) is probably just caused by the MDS server going through reconnect and timing out its client connection. > https://dl.dropbox.com/u/7899675/client.log > https://dl.dropbox.com/u/7899675/mds.a.log > https://dl.dropbox.com/u/7899675/mds.b.log > https://dl.dropbox.com/u/7899675/mds.c.log > > - Noah > > On Thu, Aug 30, 2012 at 1:39 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >> What about this: >> >> diff --git a/src/client/Client.cc b/src/client/Client.cc >> index 3333966..003e3f8 100644 >> --- a/src/client/Client.cc >> +++ b/src/client/Client.cc >> @@ -294,6 +294,7 @@ int Client::init() >> monclient->set_want_keys(CEPH_ENTITY_TYPE_MDS | CEPH_ENTITY_TYPE_OSD); >> monclient->sub_want("mdsmap", 0, 0); >> monclient->sub_want("osdmap", 0, CEPH_SUBSCRIBE_ONETIME); >> + monclient->renew_subs(); >> >> // logger >> PerfCountersBuilder plb(cct, "client", l_c_first, l_c_last); >> >> >> If that doesn't do it, can you reproduce with 'debug client = 20' and >> 'debug monc = 20'? >> >> Thanks! >> sage >> >> >> >> On Thu, 30 Aug 2012, Noah Watkins wrote: >> >>> Here ya go: >>> >>> https://dl.dropbox.com/u/7899675/client.log >>> https://dl.dropbox.com/u/7899675/mds.a.log >>> https://dl.dropbox.com/u/7899675/mds.b.log >>> https://dl.dropbox.com/u/7899675/mds.c.log >>> >>> - Noah >>> >>> On Thu, Aug 30, 2012 at 1:15 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>> > I see that Server::handle_client_session is calling mdlog->flush(), so >>> > it's a bit odd. Can you generate a log with 'debug ms = 1' on the client >>> > (and maybe mds) side? >>> > >>> > s >>> > >>> > On Thu, 30 Aug 2012, Noah Watkins wrote: >>> > >>> >> On Thu, Aug 30, 2012 at 1:06 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>> >> > On Thu, Aug 30, 2012 at 12:55 PM, Noah Watkins <jayhawk@xxxxxxxxxxx> wrote: >>> >> >> Using a tick interval of 1 drops the cost down to 3 seconds, but still >>> >> >> a long time for running many unit tests that use fresh mounts. >>> >> > >>> >> > Are you using ceph-fuse or the kernel client? And how many of each daemon type? >>> >> >>> >> I'm using the C api, and there are 3 mon, 3 mds, 1 osd. >>> >> >>> >> > That said; I'm seeing broadly similar numbers ? with one of each >>> >> > daemon (but otherwise the vstart defaults) "time sudo ceph-fuse mnt" >>> >> > reports 3.1 seconds. >>> >> >>> >> >>> >>> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html