Re: Is IP address failover supported on the CIFS client?

Steve French <smfrench@xxxxxxxxx> · Tue, 31 Jul 2018 23:15:50 -0500

This is a good scenario to talk through and discuss possible workarounds.   Currently cifs.ko does get DFS referrals but after connecting to one successfully doesn't fall back to the alternate when the first goes done (presumably would work if we got another DFS referral back from the server, pointing to the new server - but we should also try to reconnect to an alternate target of an earlier referral in the event that the original connection goes down and the target server stays down).

We also did some experiments with the Witness Protocol (there is a user space test tool in samba tree that can be modified to ioctl down to cifs.ko and pass in Witness notifications) - which would perhaps be a cleaner way to do this (share move or server move) - but this would require more work to finish up.

If you are open to testing some of this and are comfortable with the instructions on how to build cifs.ko for your kernel - I may be able to get you some patches to cifs.ko to experiment with DFS failover in your scenario.  It is something I think is of significant value to do

On Tue, Jul 31, 2018 at 10:19 PM Andy Beal <andybeal623@xxxxxxxxx> wrote:
Hello,
I'm in the process of setting up multi-site SMB file shares (using Windows Server) that can also be accessed from Linux clients (using the CIFS client). I'm playing with two HA configurations (each with two file servers, one per site):

1. Using DFSR to replicate data between sites, and creating a DFSN folder target that points to both file servers. Clients mount the DFSN name (e.g. //<domain>/<namespace>/<folder>)
2. Using Windows Failover Clustering. Without going into too much detail, clients mount a DNS name (an A record that points to two IPs -- one per file server). Only one file server is responding to SMB requests at any time.

In either of these cases, Windows clients behave as I'd expect: on initial mount, if one of the file servers is unavailable, it'll fall back to mounting the other (in a matter of milliseconds); during normal operation, if a file server fails, it'll automatically re-establish an SMB session to the other file server.

However, I'm not seeing the same behavior with the CIFS client on Linux. On initial mount, in either of the two configurations above, the client appears to select a single file server IP address. If that file server is unavailable, the mount fails (rather than retrying the mount with the other file server's IP address that would've been returned as part of the DFS referral response or the DNS lookup). If the mount succeeds but the underlying file server is later killed, the client doesn't try to reconnect to the other file server; instead, it continues trying to connect to the same IP address it initially mounted.

Is the behavior I'm seeing expected, based on how the CIFS client is implemented? If not, do you have any suggestions for how I could troubleshoot what's going on? I've done some extensive Googling to get the initial mount working (e.g. updating /etc/request-key.conf to add rows for cifs.upcall, setting up Kerberos tickets), but am stuck at this point.

Thank you,

Andy

-- 
Thanks,

Steve