Sorry. My mail client keeps using the wrong sender for this list. I've whacked it to avoid that going forward. On 2013-11-19T11:51:44, David Teigland <teigland@xxxxxxxxxx> wrote: > > The goal here is that we know the other endpoint is down (we received a > > node down event and have completed fencing at that stage). Hence, > > SO_LINGER to speed up the shutdown of the socket seems appropriate. > Should your patch do the same with tcp? > Is this problem especially prevalent with sctp? It seems the problem is especially prominent with SCTP, yes, probably because of different default timeouts. > With the patch, how much more likely would it be for data from a previous > connection to interfere with a new connection? (I had this problem some > years ago, and added some safeguards to deal with it, but I don't think > they are perfect. There are cases where a very short time separates > connections being closed and new connections being created.) In the error case, none. That's rather the issue we're trying to avoid: the old connections still being around interfere with reconnecting after the node has rebooted. This allows us a much faster cleanup. (We can't reconnect while the {src ip, port;dst ip, port} is still around.) Though now that I re-read it, maybe SO_REUSEADDR|SO_REUSEPORT combined could also help this? Dong Mao? But that's only available for TCP/UDP on fairly recent kernels. (https://lwn.net/Articles/542629/) So I think SO_LINGER is the better bet. > > (We may actually only want to set SO_LINGER for the node down event > > case, not generally. On receiving node down, set SO_LINGER as described > > here. Otherwise, we may hit the corner cases in the first reference; but > > we're already exposed to that today.) > I'd suggest giving this a try. It also depends a bit on the semantics of the DLM protocol on which you and Dong Mao are better experts than myself. SO_LINGER could only hurt us if there could be potential data that we expect to be received by the target even after we've closed the socket. My limited understanding of the DLM source suggests that this isn't true; and my further limited understanding of SO_LINGER suggests that, if we did that in the past, we were already asking for trouble. > > I really would love to know how we can avoid it. We have a few customers > > who can reproduce this. > > Then perhaps this happens in more realistic and unavoidable cases than the > 'echo b > /proc/sysrq-trigger' example. That is obviously just a "simulate a node crash" event. That seems pretty realistic and, alas, unavoidable to me. You can hit the same by powering off the node, too. Right now, it can happen that the node then reboots and we can't reconnect, unless the customer waits ~5 minutes for the connections to expire. That isn't desirable. Best, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster