Re: regression in CIFS(?) between 4.17.14 and 4.18.0

"Dr. Bernd Feige" <bernd.feige@xxxxxxxxxxxxxxxxxxxxx> · Thu, 06 Sep 2018 17:25:41 +0200

Am Donnerstag, den 06.09.2018, 08:36 -0500 schrieb Steve French:
> To clarify a few things:
> - are you saying that you had the original older dialect (SMB2.0,
> vers=2.0) signing problem, but now that that is resolved see
> occasional hangs in listing directories

Exactly! May of course be that this is a different regression but it
came with 4.18 as well...

I now use vers=3 as mount option (the kernel fills the log with
warnings about the changed default if I leave it out...).
/proc/fs/cifs/DebugData (no Stats in there) says that everything is
Dialect 3 now (see below for an excerpt).

> - do you see any correlation between the size of the directory and
> hangs

I thought so initially, as I first listed a few subdirs without
problems and then it hung as I listed one with >16000 entries. but then
it also hung once on the first attempt when listing a smaller top-level 
directory.

> - is a reconnect involved (I see mention of the krb5 upcall, which
> presumably could hang in a reconnect scenario if AD server were not
> available to refresh the ticket and it had expired)?  You can see the
> number of reconnects (if any) in /proc/fs/cifs/Stats

This all happens within minutes after an AD login, I'm quite sure that
no expiration is involved.

> - if it is a reconnect any idea if intermittent network issue or hung
> server was the reason for the reconnect?

I switch back and forth between 4.17.13 and 4.18.6, and it happens
every time I try in 4.18.6 but never in 4.17.13. There's definitively
no connectivity or service problem.

> - for the hung directory examples are you seeing them with smb3
> (which
> presumably is the most common dialect being used and safest) or
> earlier dialect/

Yes, if what DebugData reports is correct...

> - what is the server type?

It's a Microsoft system (not samba) which supports up to 3.11 as
reported by nmap. Is there a way to probe it more exactly?

Note that /proc/fs/cifs/LinuxExtensionsEnabled is 1 although I didn't
specifically request it.

>From DebugData:
Features: dfs spnego xattr acl

DFS server entry: "Dialect 0x302 signed"
file server entry: "Dialect 0x300"
PathComponentMax: 255 Status: 1 type: DISK 
	Share Capabilities: None Aligned, Partition Aligned, TRIM
support,	Share Flags: 0x30	Optimal sector size: 0x1000

MIDs:
	State: 2 com: 6 pid: 27772 cbdata: 00000000634d19f4 mid 6581

> On Thu, Sep 6, 2018 at 7:30 AM Dr. Bernd Feige
> <bernd.feige@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > 
> > Dear Steve et al.,
> > 
> > I'm running Linux 4.18.6 in a corporate environment and now have
> > the
> > issue that listing directories lets the process hang interminably,
> > loading one CPU by 100%. This does not happen every time (i.e.
> > sometimes a directory listing completes).
> > 
> > Note that this works solidly with 4.17.13.
> > 
> > More verbatim:
> > 
> > I had the problem the OP noted with 4.18.5 during upcall. I had
> > vers=2.1 in the mount options since the servers used to not support
> > vers=3. I didn't get a kernel oops but a hung mount process. It
> > worked
> > with 4.17.13.
> > 
> > Reading this thread, I then dropped the vers= option and found that
> > mounts worked again (still with 4.18.5) after confirming:
> > 
> > nmap -Pn -p 445 --script smb-protocols ad
> > 
> > PORT    STATE SERVICE
> > 445/tcp open  microsoft-ds
> > 
> > Host script results:
> > > smb-protocols:
> > >   dialects:
> > >     NT LM 0.12 (SMBv1) [dangerous, but default]
> > >     2.02
> > >     2.10
> > >     3.00
> > >     3.02
> > > _    3.11
> > 
> > However, it may be that the actual mount uses version 2 still:
> > 
> > Sep 06 09:43:18  cifs.upcall[15995]: key description:
> > cifs.spnego;0;0;39010000;ver=0x2;host=xxx;ip4=xxx;sec=krb5;uid=0x3e
> > 8;creduid=0x3e8;user=root;pid=0x671b
> > Sep 06 09:43:18  cifs.upcall[15995]: ver=2
> > Sep 06 09:43:18  cifs.upcall[15995]: host=xxx
> > Sep 06 09:43:18  cifs.upcall[15995]: ip=xxx
> > Sep 06 09:43:18  cifs.upcall[15995]: sec=1
> > Sep 06 09:43:18  cifs.upcall[15995]: uid=1000
> > Sep 06 09:43:18  cifs.upcall[15995]: creduid=1000
> > Sep 06 09:43:18  cifs.upcall[15995]: user=root
> > Sep 06 09:43:18  cifs.upcall[15995]: pid=26395
> > Sep 06 09:43:18  cifs.upcall[15995]:
> > get_cachename_from_process_env: pathname=/proc/26395/environ
> > Sep 06 09:43:18  cifs.upcall[15995]:
> > get_cachename_from_process_env: read to end of buffer (4096 bytes)
> > Sep 06 09:43:18  cifs.upcall[15995]: get_existing_cc: default
> > ccache is FILE:/tmp/krb5cc_1000
> > Sep 06 09:43:18  cifs.upcall[15995]: handle_krb5_mech: getting
> > service ticket for xxx
> > Sep 06 09:43:18  cifs.upcall[15995]: handle_krb5_mech: obtained
> > service ticket
> > Sep 06 09:43:18  cifs.upcall[15995]: Exit status 0
> > 
> > Thanks and best regards,
> > Bernd
> 
> 
>