Re: Add option SO_LINGER to dlm sctp socket when the other endpoint is down.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2013-11-20T12:34:43, David Teigland <teigland@xxxxxxxxxx> wrote:

> > (We can't reconnect while the {src ip, port;dst ip, port} is still
> > around.)
> I'm not sure, but I think I'm worried about a different problem: messages
> are sent through the old connection, a node restarts, a new connection is
> quickly created, and the *old messages* are received through the new
> connection. 

I'm a bit unclear how this could happen, or how this could be made worse
by this SO_LINGER patch.

After all, this makes connection tear down faster - that is, once we've
already called close() on the call. That happens after we've gotten a
node down event, and fenced it.

And: TCP, which apparently restarts faster anyway, already would have
this problem. And it doesn't seem to have it.

It basically helps if there's traffic stuck in flight while a node
crashes. I don't know why that's more likely to happen with SCTP,
perhaps because the damn thing is slower, or because it buffers more due
to it's ability to resend over different channels ...

> The dlm tries to detect and discard stale/old messages, but if they
> get through, they can cause problems.  I'd like to know whether the
> LINGER change could make this more likely.  If so, then we may want
> this change to be a configuration option.

I don't really think it could.

> > pretty realistic and, alas, unavoidable to me. You can hit the same by
> > powering off the node, too.
> Right, I wanted to know it was not *only* the simulation case being
> affected.

I wish it was. It took the team quite a while to track down. It's really
an annoying bug since it can't always be reproduced.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster





[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux