RE: Failure to reconnect after cluster failvoer

Tom Talpey <ttalpey@xxxxxxxxxxxxx> · Wed, 27 Feb 2019 14:16:55 +0000

> -----Original Message-----
> From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
> Sent: Monday, February 25, 2019 8:14 AM
> To: Tom Talpey <ttalpey@xxxxxxxxxxxxx>; Steve French
> <smfrench@xxxxxxxxx>
> Cc: CIFS <linux-cifs@xxxxxxxxxxxxxxx>
> Subject: Re: Failure to reconnect after cluster failvoer
> 
> On 2/22/19 11:25 PM, Tom Talpey wrote:
> >> -----Original Message-----
> >> From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
> >> Sent: Friday, February 22, 2019 9:17 AM
> >> To: Tom Talpey <ttalpey@xxxxxxxxxxxxx>; Steve French
> >> <smfrench@xxxxxxxxx>
> >> Cc: CIFS <linux-cifs@xxxxxxxxxxxxxxx>
> >> Subject: Re: Failure to reconnect after cluster failvoer
> >>
> >> On 2/21/19 5:59 PM, Tom Talpey wrote:
> >>> The reconnect is apparently using a dotted-quad as the servername, and
> you
> >> can see the auth is forced to NTLM as a consequence. Is that the way you
> >> initially mounted the share (i.e. mount 10.71.217.50:/smbshare /mnt)?
> >>>
> >>> -----Original Message-----
> >>> From: linux-cifs-owner@xxxxxxxxxxxxxxx <linux-cifs-
> owner@xxxxxxxxxxxxxxx>
> >> On Behalf Of Steve French
> >>> Sent: Thursday, February 21, 2019 9:07 AM
> >>> To: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
> >>> Cc: CIFS <linux-cifs@xxxxxxxxxxxxxxx>
> >>> Subject: Re: Failure to reconnect after cluster failvoer
> >>>
> >>> Couple quick thoughts.
> >>>
> >>> Does this work on current kernels (5.0 for example).
> >>>
> >>> Was thinking about patches that might affect this like:
> >>> - "cifs: connect to servername instead of IP for IPC$ share"
> >>> - "smb3: on reconnect set PreviousSessionId field"
> >>> - Paulo's patches (has cifs-utils coreq) to reconnect to new IP
> >>> address if hostname's IP address changed and his add support for
> >>> failover
> >>> - Paulo's patch to remove trailing slashes from server UNC name
> >>>
> >> I've reproduced this with 5.0-rc7 and the latest cifs-utils from git.
> >> The share was mounted as follows (yes, by IP):
> >>
> >> mount.cifs -o
> >> vers=3.0,cache=loose,actimeo=0,username=x,domain=y,password=z
> >> '//10.71.217.31/smbshare' /mnt
> >>
> >> Here is the tcpdump when it fails to reconnect properly:
> > ...
> >>
> >> The initial connection is at timestamp 0s, reconnection at 13s,
> >> STATUS_NETWORK_NAME_DELETED at 60s.
> >>
> >> For comparison, here is a tcpdump using the "fix" from my previous mail:
> > ...
> >>
> >> The initial connection is at timestamp 0s, reconnection at 34s,
> >> successful read request at 215s.
> >>
> >> Note that the tree connect for IPC$ only happens _after_ the tree
> >> connect for the share succeeds.
> >
> > Thanks for the full traces, they clarify the situation. But, I don’t see any
> > meaningful difference in the client behavior. The ordering of the two
> > treeconnects is the same between the two - initially, "IPC$" then
> > "smbshare", and on reconnect, the other way around. So, I'm unclear
> > whether your patch did anything.
> 
> There is definitely a difference. Before the patch, on reconnect the client:

I'm still not so sure the difference is relevant. The timing is a bit different, but
in itself the IPC$ treeconnect isn't actually used, and in any case it succeeds
in both scenarios. So, I'm thinking it's either the timing, or coincidence.

> * Connects to "smbshare" which fails
> * Then connects to "IPC$" which succeeds
> * Then tries again to connect to smbshare which fails repeatedly

Here's what I see:
Event / timestamp / etc
Connection lost / 25.97 / Server sends many RST to client
Connection reestablished / 34.17
Treeconnect to smbshare / 34.17 / STATUS_B_N_N (retries with same result every 2 sec)
Treeconnect to IPC$ / 34.18 / success
Treeconnect to smbshare / 60.38 / STATUS_N_N_D (etc)

> After the patch, on reconnect the client:
> 
> * Connects to "smbshare" which fails
> * Then tries again to connect to "smbshare" which succeeds after several
> retries
> * Then tries to connect to "IPC$" which succeeds

This time:
Connection lost  / 9.81 / Server sends RST
Connection reestablished / 9.82 / status 0xc0000466 (some weird disk hardware status)
Connection lost / 13.53 / Server sends RST
Connection reestablished / 13.53
Treeconnect to smbshare / 13.63 / STATUS_B_N_N (retries with same result every 2 sec)
Treeconnect to smbshare / 43.90 / success (about 30 secs, 17 retries elapsed)
Treeconnect to IPC$ / 43.90 / success

So, the main effect of your patch is that the IPC$ attempt happens a lot *later*,
it certainly didn't affect the success of the smbshare treeconnect - it happened
only after that succeeded! And I don't see how deferring an unrelated treeconnect
would help that. I bet it would have the same result if the IPC$ didn't happen at
all.

I really think there's something wrong with your server, and not because of a bug.
Unfortunately both Steve and I are at FAST'19 and Vault here in Boston, so we're
not able to get much done. I'd love to understand this better, though...

Tom.

> This subtle reordering somehow makes it work. It may indeed be a server
> bug rather than a client bug. I was hoping someone could shed some light
> on this.
> 
> >
> > The STATUS_NETWORK_NAME_DELETED is a consequence of the failed
> > re-establishment of the tree connect, and is not itself the problem. The
> > server is simply timing out the treeid, since the client did not successfully
> > reclaim it. The repeated STATUS_BAD_NETWORK_NAME is the issue.
> >
> > Are you sure the clustered server is recovering properly when you are
> > forcing the failover? For example, if it's a two-node cluster, maybe node A
> > can take over node B, but node B has issues taking over node A. Is there
> > anything relevant in the server logs?
> >
> 
> It's a two node cluster. The behaviour happens reliably when failing
> over either way. After failover, the server state is consistent. E.g.
> after a failover from node A to node B, node B shows itself as the
> primary server and the node A is marked as down. I couldn't find
> anything interesting in the server logs.
> 
> Thanks,
> --
> Ross Lagerwall