Am Donnerstag, den 06.09.2018, 08:36 -0500 schrieb Steve French: > To clarify a few things: > - are you saying that you had the original older dialect (SMB2.0, > vers=2.0) signing problem, but now that that is resolved see > occasional hangs in listing directories Exactly! May of course be that this is a different regression but it came with 4.18 as well... I now use vers=3 as mount option (the kernel fills the log with warnings about the changed default if I leave it out...). /proc/fs/cifs/DebugData (no Stats in there) says that everything is Dialect 3 now (see below for an excerpt). > - do you see any correlation between the size of the directory and > hangs I thought so initially, as I first listed a few subdirs without problems and then it hung as I listed one with >16000 entries. but then it also hung once on the first attempt when listing a smaller top-level directory. > - is a reconnect involved (I see mention of the krb5 upcall, which > presumably could hang in a reconnect scenario if AD server were not > available to refresh the ticket and it had expired)? You can see the > number of reconnects (if any) in /proc/fs/cifs/Stats This all happens within minutes after an AD login, I'm quite sure that no expiration is involved. > - if it is a reconnect any idea if intermittent network issue or hung > server was the reason for the reconnect? I switch back and forth between 4.17.13 and 4.18.6, and it happens every time I try in 4.18.6 but never in 4.17.13. There's definitively no connectivity or service problem. > - for the hung directory examples are you seeing them with smb3 > (which > presumably is the most common dialect being used and safest) or > earlier dialect/ Yes, if what DebugData reports is correct... > - what is the server type? It's a Microsoft system (not samba) which supports up to 3.11 as reported by nmap. Is there a way to probe it more exactly? Note that /proc/fs/cifs/LinuxExtensionsEnabled is 1 although I didn't specifically request it. >From DebugData: Features: dfs spnego xattr acl DFS server entry: "Dialect 0x302 signed" file server entry: "Dialect 0x300" PathComponentMax: 255 Status: 1 type: DISK Share Capabilities: None Aligned, Partition Aligned, TRIM support, Share Flags: 0x30 Optimal sector size: 0x1000 MIDs: State: 2 com: 6 pid: 27772 cbdata: 00000000634d19f4 mid 6581 > On Thu, Sep 6, 2018 at 7:30 AM Dr. Bernd Feige > <bernd.feige@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > > Dear Steve et al., > > > > I'm running Linux 4.18.6 in a corporate environment and now have > > the > > issue that listing directories lets the process hang interminably, > > loading one CPU by 100%. This does not happen every time (i.e. > > sometimes a directory listing completes). > > > > Note that this works solidly with 4.17.13. > > > > More verbatim: > > > > I had the problem the OP noted with 4.18.5 during upcall. I had > > vers=2.1 in the mount options since the servers used to not support > > vers=3. I didn't get a kernel oops but a hung mount process. It > > worked > > with 4.17.13. > > > > Reading this thread, I then dropped the vers= option and found that > > mounts worked again (still with 4.18.5) after confirming: > > > > nmap -Pn -p 445 --script smb-protocols ad > > > > PORT STATE SERVICE > > 445/tcp open microsoft-ds > > > > Host script results: > > > smb-protocols: > > > dialects: > > > NT LM 0.12 (SMBv1) [dangerous, but default] > > > 2.02 > > > 2.10 > > > 3.00 > > > 3.02 > > > _ 3.11 > > > > However, it may be that the actual mount uses version 2 still: > > > > Sep 06 09:43:18 cifs.upcall[15995]: key description: > > cifs.spnego;0;0;39010000;ver=0x2;host=xxx;ip4=xxx;sec=krb5;uid=0x3e > > 8;creduid=0x3e8;user=root;pid=0x671b > > Sep 06 09:43:18 cifs.upcall[15995]: ver=2 > > Sep 06 09:43:18 cifs.upcall[15995]: host=xxx > > Sep 06 09:43:18 cifs.upcall[15995]: ip=xxx > > Sep 06 09:43:18 cifs.upcall[15995]: sec=1 > > Sep 06 09:43:18 cifs.upcall[15995]: uid=1000 > > Sep 06 09:43:18 cifs.upcall[15995]: creduid=1000 > > Sep 06 09:43:18 cifs.upcall[15995]: user=root > > Sep 06 09:43:18 cifs.upcall[15995]: pid=26395 > > Sep 06 09:43:18 cifs.upcall[15995]: > > get_cachename_from_process_env: pathname=/proc/26395/environ > > Sep 06 09:43:18 cifs.upcall[15995]: > > get_cachename_from_process_env: read to end of buffer (4096 bytes) > > Sep 06 09:43:18 cifs.upcall[15995]: get_existing_cc: default > > ccache is FILE:/tmp/krb5cc_1000 > > Sep 06 09:43:18 cifs.upcall[15995]: handle_krb5_mech: getting > > service ticket for xxx > > Sep 06 09:43:18 cifs.upcall[15995]: handle_krb5_mech: obtained > > service ticket > > Sep 06 09:43:18 cifs.upcall[15995]: Exit status 0 > > > > Thanks and best regards, > > Bernd > > >