Hi linux-cifs, This patchset adds * a DFS cache so that DFS links can be resolved even when hosts are down * DFS failover so that if the DFS target we are connected to is down cifs.ko will try to reconnect to a different target if there are any. This is 90% Paulo's work, I gave him the task and a general roadmap to go about it thinking it would not be too hard and it turned into this massive, intricate 2k lines beast as we both slowly realized all the ins and outs and subtleties of the problem and resulting solutions. So congrats to him. * * * * What is DFS? DFS is basically symbolic links across servers. You can have links that points to a UNC path to directories on other servers. The client has to resolve the link, get the target from the result, and connect to the target. There is no proxy-ing involved. A share that can take resolving requests and respond to them is called a DFS root. A host with such a share is in a "standalone" setup. A domain can be setup so that when accessing a share, the share *itself* is redirected to a DFS root on a separate server. Essentially the share is a symbolic link itself and the domain doesn't host it, even though you typed \\domain\dfsshare to access it. This is called a "domain-based" setup. Each of those links can have multiple targets so that if one fails you can try another. So how do you know a file is a link or not? Well if you try to access it the server will reply with the error STATUS_PATH_NOT_COVERED. Then you are supposed to issue a DFS Referral Request on that path and you get the targets in the response. You are then supposed to connect to one of the target and repeat. Microsoft has a dedicated document [MS-DFSC] about it which I suggest you take a look. * How does DFS work in cifs.ko? There are 2 entry points for DFS currently: - cifs_mount(): the "static" codepath - dentry automount: the dynamic codepath The static code path is when you mount a UNC path \\FOO\d1\d2\d3. cifs_mount() will follow all the links until you reach the final host and mount that as a regular mount point. The dynamic code path is when you mount a DFS root (= a share that has links) and, *after* mounting, access one of those links. In this case cifs.ko will set an AUTOMOUNT flag on the inode&dentry of the file which is a linux VFS thing that instruct the VFS upper-layer to lazily call the d_automount() dentry operations when needed. That operation does a VFS sub-mount on that dentry. It ends up calling cifs_mount() and it will get its own superblock and everything. The problem is we pass the resolved path when mounting which means if it fails, we cannot resolve again and use a different target. Note that the original mount path can have multiple nested links. This is why we need a cache to store results to do failover properly. * What this patch adds - a DFS cache so that DFS links can be resolved even when hosts are down - DFS failover so that if the DFS target we are connected to is down cifs.ko will try to reconnect to a different target if there are any +- cifs: Refactor out cifs_mount() | cifs: Skip any trailing backslashes from UNC refactor | cifs: Fix separator when building path from dentry &bugfix | cifs: Make devname param optional in cifs_compose_mount_options() | cifs: Respect -EAGAIN when querying paths | cifs: Save TTL value when parsing DFS referrals +- cifs: auto disable 'serverino' in dfs mounts new impl,| cifs: Add DFS cache routines <------ main new code replace, | cifs: Make use of DFS cache to get new DFS referrals reco | cifs: Add support for failover in cifs_mount() & | cifs: Add support for failover in cifs_reconnect() failover | cifs: start DFS cache refresher in cifs_mount() | cifs: Add support for failover in smb2_reconnect() +- cifs: Add support for failover in cifs_reconnect_tcon() ** The DFS cache The DFS cache is a hashtable that maps UNC paths to cache entries. A cache entry contains: - the UNC path it is mapped on - how much the the UNC path the entry consumes - flags - a Time-To-Live after which the entry expires - a list of possible targets (linked lists of UNC paths) - a "hint target" pointing the last known working target or the first target if none were tried. This hint lets cifs.ko remember and try working targets first. * Looking for an entry in the cache is done with dfs_cache_find() - if no valid entries are found, a DFS query is made, stored in the cache and returned - the full target list can be copied and returned to avoid race conditions and looped on with the help with the dfs_cache_tgt_iterator * Updating the target hint to the next target is done with dfs_cache_update_tgthint() These functions have a dfs_cache_noreq_XXX() version that doesn't fetches referrals if no entries are found. These versions don't require the tcp/ses/tcon/cifs_sb parameters as a result. ** Refreshing expired cache entries Expired entries cannot be used and since they have a pretty short TTL in order for them to be useful for failover the DFS cache adds a delayed work called periodically to keep them fresh. Since we might not have available connections to issue the referral request when refreshing we need to store volume_info structs with credentials and other needed info to be able to connect to the right server. ** Mount failover The static and dynamic codepaths were patched to use the DFS cache to try alternative targets. We store the initial user-provided mount path in the superblock as origin_fullpath. ** Reconnect failover As you know the reconnect logic isn't the simplest to follow and we had to tweak some things: Since we might try to reconnect to multiple targets and we do this sequentially threads waiting for tcp reconnection should wait the socket timeout x number of targets. ** Server file id When following a DFS link you connect to a different server with a different set of file ids. Those 2 sets of ids can overlap as a result. Similarly if you failover to a different server, you will get a different set file ids than the ones you initially got from your original servers. We decided to disable server inode in case of failover and had to tweak the logic of dentry revalidation in order to not return -ESTALE on syscalls. ** Remaining issues We hit problem sometime where we suspect if the reconnect codepath is triggered *while* mounting a DFS link you hit a NULL-ptr deref in the reconnection code. This may be an already existing bug. * Testing This was tested in various ways: - static and dynamic path at mount time with random target initially down - dropping packets from an already mounted connection and waiting for IO timeout or echo-thread timeout - soft and hard mount (hard means only fail to userspace if there is no other way) We have a little testsuite that tries mounting every weird combinations of links and paths in the static or dynamic code path and a simple reconnect test. We used iptables on the client to drop packets to/from the server to simulate failure. All of our testing was on very short lived sessions though, and we need more testing on long living ones. Our test setups were the following: * a 3 VM Windows Server "domain-based" setup (SMB1, SMB3): (share link) DOM/dfstest -> DFSROOT1/dfstest [files and links to] -> {DFSROOT1/share1,DFSROOT2/share2} -> DFSROOT2/dfstest [files and links to] -> {DFSROOT1/share1,DFSROOT2/share2} * a 3 VM "standalone" samba setup (SMB1+unix extension, SMB3): ROOT/dfstest [files and links to] -> {TARGET1/share1, TARGET2/share2} * * * Paulo Alcantara (14): cifs: Refactor out cifs_mount() cifs: Skip any trailing backslashes from UNC cifs: Fix separator when building path from dentry cifs: Make devname param optional in cifs_compose_mount_options() cifs: Respect -EAGAIN when querying paths cifs: Save TTL value when parsing DFS referrals cifs: auto disable 'serverino' in dfs mounts cifs: Add DFS cache routines cifs: Make use of DFS cache to get new DFS referrals cifs: Add support for failover in cifs_mount() cifs: Add support for failover in cifs_reconnect() cifs: start DFS cache refresher in cifs_mount() cifs: Add support for failover in smb2_reconnect() cifs: Add support for failover in cifs_reconnect_tcon() fs/cifs/Makefile | 2 +- fs/cifs/cifs_debug.c | 12 + fs/cifs/cifs_dfs_ref.c | 141 +++-- fs/cifs/cifs_fs_sb.h | 9 + fs/cifs/cifsfs.c | 17 +- fs/cifs/cifsglob.h | 14 +- fs/cifs/cifsproto.h | 28 +- fs/cifs/cifssmb.c | 88 ++- fs/cifs/connect.c | 889 +++++++++++++++++++++++-------- fs/cifs/dfs_cache.c | 1379 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/cifs/dfs_cache.h | 97 ++++ fs/cifs/dir.c | 2 +- fs/cifs/inode.c | 49 +- fs/cifs/misc.c | 34 +- fs/cifs/smb1ops.c | 15 +- fs/cifs/smb2ops.c | 23 +- fs/cifs/smb2pdu.c | 88 ++- 17 files changed, 2565 insertions(+), 322 deletions(-) create mode 100644 fs/cifs/dfs_cache.c create mode 100644 fs/cifs/dfs_cache.h -- 2.13.7