Couple quick thoughts. Does this work on current kernels (5.0 for example). Was thinking about patches that might affect this like: - "cifs: connect to servername instead of IP for IPC$ share" - "smb3: on reconnect set PreviousSessionId field" - Paulo's patches (has cifs-utils coreq) to reconnect to new IP address if hostname's IP address changed and his add support for failover - Paulo's patch to remove trailing slashes from server UNC name On Thu, Feb 21, 2019 at 10:58 AM Ross Lagerwall <ross.lagerwall@xxxxxxxxxx> wrote: > > Hi, > > I have an issue with SMB cluster failover. There are two Windows 2012 R2 > Datacenter servers in the cluster. If the primary server is turned off, > then the secondary server becomes the primary. However, when this > happens the kernel client is not able to recover the mount. > > Here is the reconnection network trace: > > Time Source Destination Protocol Length Info > 16.640530 10.71.217.53 10.71.217.50 SMB2 172 Negotiate Protocol > Request > 16.641723 10.71.217.50 10.71.217.53 SMB2 318 Negotiate Protocol > Response > 16.641799 10.71.217.53 10.71.217.50 SMB2 190 Session Setup > Request, NTLMSSP_NEGOTIATE > 16.642148 10.71.217.50 10.71.217.53 SMB2 442 Session Setup > Response, Error: STATUS_MORE_PROCESSING_REQUIRED, NTLMSSP_CHALLENGE > 16.642201 10.71.217.53 10.71.217.50 SMB2 562 Session Setup > Request, NTLMSSP_AUTH, User: clusterad.local7337\Administrator > 16.656407 10.71.217.50 10.71.217.53 SMB2 142 Session Setup Response > 16.656492 10.71.217.53 10.71.217.50 SMB2 190 Tree Connect Request > Tree: \\10.71.217.50\smbshare > 16.656916 10.71.217.50 10.71.217.53 SMB2 143 Tree Connect > Response, Error: STATUS_BAD_NETWORK_NAME > 16.659249 10.71.217.53 10.71.217.50 SMB2 190 Tree Connect Request > Tree: \\10.71.217.50\smbshare > 16.659635 10.71.217.50 10.71.217.53 SMB2 143 Tree Connect > Response, Error: STATUS_BAD_NETWORK_NAME > 20.224591 10.71.217.53 10.71.217.50 SMB2 182 Tree Connect Request > Tree: \\10.71.217.50\IPC$ > 20.225344 10.71.217.50 10.71.217.53 SMB2 150 Tree Connect Response > 20.225449 10.71.217.53 10.71.217.50 SMB2 216 Ioctl Request > FSCTL_VALIDATE_NEGOTIATE_INFO > 20.225934 10.71.217.50 10.71.217.53 SMB2 206 Ioctl Response > FSCTL_VALIDATE_NEGOTIATE_INFO > 20.225975 10.71.217.53 10.71.217.50 SMB2 190 Tree Connect Request > Tree: \\10.71.217.50\smbshare > 20.226355 10.71.217.50 10.71.217.53 SMB2 143 Tree Connect > Response, Error: STATUS_BAD_NETWORK_NAME > 22.240595 10.71.217.53 10.71.217.50 SMB2 190 Tree Connect Request > Tree: \\10.71.217.50\smbshare > 22.241159 10.71.217.50 10.71.217.53 SMB2 143 Tree Connect > Response, Error: STATUS_BAD_NETWORK_NAME > 24.256590 10.71.217.53 10.71.217.50 SMB2 190 Tree Connect Request > Tree: \\10.71.217.50\smbshare > 24.257380 10.71.217.50 10.71.217.53 SMB2 143 Tree Connect > Response, Error: STATUS_BAD_NETWORK_NAME > ... > 40.384609 10.71.217.53 10.71.217.50 SMB2 190 Tree Connect Request > Tree: \\10.71.217.50\smbshare > 40.385135 10.71.217.50 10.71.217.53 SMB2 143 Tree Connect > Response, Error: STATUS_BAD_NETWORK_NAME > 41.772006 10.71.217.53 10.71.217.50 SMB2 190 Tree Connect Request > Tree: \\10.71.217.50\smbshare > 41.772562 10.71.217.50 10.71.217.53 SMB2 143 Tree Connect > Response, Error: STATUS_NETWORK_NAME_DELETED > 41.772641 10.71.217.53 10.71.217.50 SMB2 190 Tree Connect Request > Tree: \\10.71.217.50\smbshare > 41.773037 10.71.217.50 10.71.217.53 SMB2 143 Tree Connect > Response, Error: STATUS_NETWORK_NAME_DELETED > 42.400589 10.71.217.53 10.71.217.50 SMB2 190 Tree Connect Request > Tree: \\10.71.217.50\smbshare > ... > > After the secondary server takes over (presumably once it stops > returning STATUS_BAD_NETWORK_NAME), it then returns > STATUS_NETWORK_NAME_DELETED indefinitely. > > This can be fixed by delaying the tree connect to IPC$ until after the > tree connect to the share succeeds. The server then no longer returns > STATUS_NETWORK_NAME_DELETED and instead responds successfully. I'm not > sure why the server behaves like this and I'm not sure if the client is > doing something wrong. I found this out because it used to work on older > kernels before b327a717e506 ("CIFS: make IPC a regular tcon"). > > Here is the patch that makes it work: > > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c > index dba986524917..1f97ed6459bf 100644 > --- a/fs/cifs/smb2pdu.c > +++ b/fs/cifs/smb2pdu.c > @@ -2864,7 +2864,14 @@ void smb2_reconnect_server(struct work_struct *work) > > spin_unlock(&cifs_tcp_ses_lock); > > + rc = 0; > list_for_each_entry_safe(tcon, tcon2, &tmp_list, rlist) { > + if (rc) { > + list_del_init(&tcon->rlist); > + cifs_put_tcon(tcon); > + continue; > + } > + > rc = smb2_reconnect(SMB2_INTERNAL_CMD, tcon); > if (!rc) > cifs_reopen_persistent_handles(tcon); > > Can anyone give any more info on this oddity and whether this is a > useful patch? > > Thanks, > -- > Ross Lagerwall -- Thanks, Steve