On Sat, Jan 25, 2014 at 1:07 AM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Tue, 21 Jan 2014, Ilya Dryomov wrote: >> Keepalive mechanism that we are currently using doesn't handle dead >> (e.g. half-open in RFC 793 sense) TCP connections: a) it's based on >> pending ceph_osd_requests which are not necessarily present, and b) >> keepalive byte is only sent if connection is in CON_STATE_OPEN state >> because of protocol restrictions. This does not cover connection >> handshake and negotiation stages and can lead to kernel client hanging >> forever. Fix it by forcibly resetting the connection if after two >> mount_timeouts we still haven't transitioned to CON_STATE_OPEN. >> >> Fixes: http://tracker.ceph.com/issues/7139 >> >> Signed-off-by: Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx> >> --- >> include/linux/ceph/messenger.h | 5 +++++ >> net/ceph/ceph_common.c | 1 + >> net/ceph/messenger.c | 42 ++++++++++++++++++++++++++++++++++++++-- >> 3 files changed, 46 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h >> index 20ee8b63a968..25d3ec8bcdde 100644 >> --- a/include/linux/ceph/messenger.h >> +++ b/include/linux/ceph/messenger.h >> @@ -62,6 +62,8 @@ struct ceph_messenger { >> >> u64 supported_features; >> u64 required_features; >> + >> + unsigned long connect_timeout; /* jiffies */ >> }; >> >> enum ceph_msg_data_type { >> @@ -240,6 +242,8 @@ struct ceph_connection { >> >> struct delayed_work work; /* send|recv work */ >> unsigned long delay; /* current delay interval */ >> + >> + struct delayed_work connect_timeout_work; >> }; >> >> >> @@ -257,6 +261,7 @@ extern void ceph_messenger_init(struct ceph_messenger *msgr, >> struct ceph_entity_addr *myaddr, >> u64 supported_features, >> u64 required_features, >> + unsigned long connect_timeout, >> bool nocrc); >> >> extern void ceph_con_init(struct ceph_connection *con, void *private, >> diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c >> index 67d7721d237e..bc41e5a392a4 100644 >> --- a/net/ceph/ceph_common.c >> +++ b/net/ceph/ceph_common.c >> @@ -511,6 +511,7 @@ struct ceph_client *ceph_create_client(struct ceph_options *opt, void *private, >> ceph_messenger_init(&client->msgr, myaddr, >> client->supported_features, >> client->required_features, >> + 2 * client->options->mount_timeout * HZ, > > Any specific reason for 2x? I would like 1/2 if I had to pull a number > out of a hat. OTOH, mount_timeout could be 0, too, so it might make more > sense to make this a separate mount option entirely. Either that, or just > make this a separate hard-coded value that is not related to > mount_timeout. Yes, my logic for 2x was basically to not impose anything on the rest the system. Apart from some cephfs-specific stuff, mount_timeout is used in __ceph_open_session, where 1/2 would theoretically interfere with the wait-for-maps loop by virtue of shutting down the socket too early. On mount_timeout=0 connect_timeout hammer isn't armed. This was useful in testing, and it doesn't seem to change existing semantics at all. On the contrary, as it is, mount_timeout == 0 is a possible infinite loop anyway, so not arming connect_timeout (i.e. not defending against another endless hang) maps onto it quite nicely ;) Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html