On Sun, 26 Jan 2014, Ilya Dryomov wrote: > On Sat, Jan 25, 2014 at 1:07 AM, Sage Weil <sage@xxxxxxxxxxx> wrote: > > On Tue, 21 Jan 2014, Ilya Dryomov wrote: > >> Keepalive mechanism that we are currently using doesn't handle dead > >> (e.g. half-open in RFC 793 sense) TCP connections: a) it's based on > >> pending ceph_osd_requests which are not necessarily present, and b) > >> keepalive byte is only sent if connection is in CON_STATE_OPEN state > >> because of protocol restrictions. This does not cover connection > >> handshake and negotiation stages and can lead to kernel client hanging > >> forever. Fix it by forcibly resetting the connection if after two > >> mount_timeouts we still haven't transitioned to CON_STATE_OPEN. > >> > >> Fixes: http://tracker.ceph.com/issues/7139 > >> > >> Signed-off-by: Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx> > >> --- > >> include/linux/ceph/messenger.h | 5 +++++ > >> net/ceph/ceph_common.c | 1 + > >> net/ceph/messenger.c | 42 ++++++++++++++++++++++++++++++++++++++-- > >> 3 files changed, 46 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h > >> index 20ee8b63a968..25d3ec8bcdde 100644 > >> --- a/include/linux/ceph/messenger.h > >> +++ b/include/linux/ceph/messenger.h > >> @@ -62,6 +62,8 @@ struct ceph_messenger { > >> > >> u64 supported_features; > >> u64 required_features; > >> + > >> + unsigned long connect_timeout; /* jiffies */ > >> }; > >> > >> enum ceph_msg_data_type { > >> @@ -240,6 +242,8 @@ struct ceph_connection { > >> > >> struct delayed_work work; /* send|recv work */ > >> unsigned long delay; /* current delay interval */ > >> + > >> + struct delayed_work connect_timeout_work; > >> }; > >> > >> > >> @@ -257,6 +261,7 @@ extern void ceph_messenger_init(struct ceph_messenger *msgr, > >> struct ceph_entity_addr *myaddr, > >> u64 supported_features, > >> u64 required_features, > >> + unsigned long connect_timeout, > >> bool nocrc); > >> > >> extern void ceph_con_init(struct ceph_connection *con, void *private, > >> diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c > >> index 67d7721d237e..bc41e5a392a4 100644 > >> --- a/net/ceph/ceph_common.c > >> +++ b/net/ceph/ceph_common.c > >> @@ -511,6 +511,7 @@ struct ceph_client *ceph_create_client(struct ceph_options *opt, void *private, > >> ceph_messenger_init(&client->msgr, myaddr, > >> client->supported_features, > >> client->required_features, > >> + 2 * client->options->mount_timeout * HZ, > > > > Any specific reason for 2x? I would like 1/2 if I had to pull a number > > out of a hat. OTOH, mount_timeout could be 0, too, so it might make more > > sense to make this a separate mount option entirely. Either that, or just > > make this a separate hard-coded value that is not related to > > mount_timeout. > > Yes, my logic for 2x was basically to not impose anything on the rest > the system. Apart from some cephfs-specific stuff, mount_timeout is > used in __ceph_open_session, where 1/2 would theoretically interfere > with the wait-for-maps loop by virtue of shutting down the socket too > early. If this is essentially manifesting as a socket reset/disconnect, then having it at 1/2 should trigger a ceph_fault and then reconnect within the messenger layer and give us a second chance before the mount_timeout expires. At least, that's the way I'm reading the con_work: if ((fault = con_sock_closed(con))) { dout("%s: con %p SOCK_CLOSED\n", __func__, con); break; } ... if (fault) con_fault(con); I still lean toward a separately defined timeout here... > On mount_timeout=0 connect_timeout hammer isn't armed. This was useful > in testing, and it doesn't seem to change existing semantics at all. > On the contrary, as it is, mount_timeout == 0 is a possible infinite > loop anyway, so not arming connect_timeout (i.e. not defending against > another endless hang) maps onto it quite nicely ;) Sounds good. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html