Re: [REGRESSION][BISECTED] Commit 60e3318e3e900 in stable/linux-6.1.y breaks cifs client failover to another server in DFS namespace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19/06/2024, Andrew Paniakin wrote:
> Commit 60e3318e3e900 ("cifs: use fs_context for automounts") was
> released in v6.1.54 and broke the failover when one of the servers
> inside DFS becomes unavailable. We reproduced the problem on the EC2
> instances of different types. Reverting aforementioned commint on top of
> the latest stable verison v6.1.94 helps to resolve the problem.
> 
> Earliest working version is v6.2-rc1. There were two big merges of CIFS fixes:
> [1] and [2]. We would like to ask for the help to investigate this problem and
> if some of those patches need to be backported. Also, is it safe to just revert
> problematic commit until proper fixes/backports will be available?
> 
> We will help to do testing and confirm if fix works, but let me also list the
> steps we used to reproduce the problem if it will help to identify the problem:
> 1. Create Active Directory domain eg. 'corp.fsxtest.local' in AWS Directory
> Service with:
> - three AWS FSX file systems filesystem1..filesystem3
> - three Windows servers; They have DFS installed as per
>   https://learn.microsoft.com/en-us/windows-server/storage/dfs-namespaces/dfs-overview:
>     - dfs-srv1: EC2AMAZ-2EGTM59
>     - dfs-srv2: EC2AMAZ-1N36PRD
>     - dfs-srv3: EC2AMAZ-0PAUH2U 
> 
>  2. Create DFS namespace eg. 'dfs-namespace' in Windows server 2008 mode
>  and three folders targets in it:
> - referral-a mapped to filesystem1.corp.local
> - referral-b mapped to filesystem2.corp.local
> - referral-c mapped to filesystem3.corp.local
> - local folders dfs-srv1..dfs-srv3 in C:\DFSRoots\dfs-namespace of every
>   Windows server. This helps to quickly define underlying server when
>   DFS is mounted.
> 
> 3. Enabled cifs debug logs:
> ```
> echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control
> echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control
> echo 7 > /proc/fs/cifs/cifsFYI
> ```
> 
> 4. Mount DFS namespace on Amazon Linux 2023 instance running any vanilla
> kernel v6.1.54+:
> ```
> dmesg -c &>/dev/null
> cd /mnt
> mount -t cifs -o cred=/mnt/creds,echo_interval=5 \
>     //corp.fsxtest.local/dfs-namespace \
>     ./dfs-namespace
> ```
> 
> 5. List DFS root, it's also required to avoid recursive mounts that happen
> during regular 'ls' run:
> ```
> sh -c 'ls dfs-namespace'
> dfs-srv2  referral-a  referral-b
> ```
> 
> The DFS server is EC2AMAZ-1N36PRD, it's also listed in mount:
> ```
> [root@ip-172-31-2-82 mnt]# mount | grep dfs
> //corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
> //EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
> ```
> 
> List files in first folder:
> ```
> sh -c 'ls dfs-namespace/referral-a'
> filea.txt.txt
> ```
> 
> 6. Shutdown DFS server-2.
> List DFS root again, server changed from dfs-srv2 to dfs-srv1 EC2AMAZ-2EGTM59:
> ```
> sh -c 'ls dfs-namespace'
> dfs-srv1  referral-a  referral-b
> ```
> 
> 7. Try to list files in another folder, this causes ls to fail with error:
> ```
> sh -c 'ls dfs-namespace/referral-b'
> ls: cannot access 'dfs-namespace/referral-b': No route to host```
> 
> Sometimes it's also 'Operation now in progress' error.
> 
> mount shows the same output:
> ```
> //corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
> //EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
> ```
> 
> I also attached kernel debug logs from this test.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=851f657a86421
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a924817d2ed9
> 
> Reported-by: Andrei Paniakin <apanyaki@xxxxxxxxxx>
> Bisected-by: Simba Bonga <simbarb@xxxxxxxxxx>
> ---
> 
> #regzbot introduced: v6.1.54..v6.2-rc1


Friendly reminder, did anyone had a chance to look into this report?




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux